All posts by Mauro Krikorian

Hive queries on complex JSON

Lately, I’ve been immersed on some Big Data projects. One of the issues I had to solve was related to getting logs from JSON using a Hive query. You can find many resources on the web showing how to deal with JSON on a Hive query, but there aren’t many good examples for very custom data. As starting point if you want to handle JSON you can use one of the SerDe packages around that will help you to deal with it (for example hive-json-serde project you can find at Google’s code). This SerDe is pretty cool to parse simple JSON objects, but what if you need to deal with arrays, maps or nested complex structures?

Then you can continue your research and find another cool project that allows parsing arrays and nested structures within JSON objects, and also works with arrays/maps from the JAVA world when you might need to serialize your data. You can take a look at this project: rcongiu/Hive-JSON-Serde.

This project is great to deal with nested structures and arrays within JSON objects as shown below:

[sourcecode language=”java”]
{ "country" : "Switzerland", "languages" : ["German", "French", "Italian"], "religions" : { "catholic" : [10,20], "protestant" : [40,50] } }
[/sourcecode]

but also rows that are directly JSON arrays are supported – for example:

[sourcecode language=”java”]
[{ "blogID" : "FJY26J1333", "data" : "20120401", "name" : "vpxnksu" }]
[{ "blogID" : "VSAUMDFXFD", "data" : "20120401", "name" : "yhftrcx" }]
[/sourcecode]

As you’ve seen, this one solves lots of the issues you could find, but what if your data is customized in a way that you couldn’t?

Let’s suppose, as I had, you want to deal with a JSON array at root level containing N JSON objects. Then, how you can deal with it?

Consider the example below:

[sourcecode language=”java”]
[{ "blogID" : "FJY26J1333", "data" : "20120401", "name" : "vpxnksu" }, { "blogID" : "VSAUMDFXFD", "data" : "20120401", "name" : "yhftrcx" }]
[/sourcecode]

As you dive deep into rcongiu/Hive-JSON-Serde code you see that for each record received as input to be processed, you get one record as output: a JAVA object that Hive can manipulate. This leads to getting one row with the first object ({ “blogID” : “FJY26J1333″, “data” : “20120401”, “name” : “vpxnksu” } – if your input is the example above).Why is this happening? Because the default mapper for the file format is used in this case and it’s interpreting each line as an input row for the SerDe within the mapper stage. What you can do to workaround this issue? Well, one simple answer would be: You should implement your own file format mapper!

At this point is where you need to code some things to deal with your custom format. You can see in the CREATE TABLE statement definition that it supports a file_format as well as a row_format (which you have been using to deal with JSON linking it to the SerDe package of your choice). So, what do you have to implement to provide a class to the INPUTFORMAT and OUTPUTFORMAT file_format customizers?

Well you will need to implement logic for the following:

Fortunately, you have an abstract base class for each one of them that you could use as your starting points:

In the rest of this post I’ll show you how you can easily implement the InputFormat<K,V> interface dealing with your custom data, and helping you to define and execute a SELECT clause on your data.

So, for the InputFormat interface I’m inheriting from the FileInputFormat base class and called my custom formatter ‘JsonInputFormat’ as you can see below:

[sourcecode language=”java”]
public class JsonInputFormat extends FileInputFormat<LongWritable, Text>
implements JobConfigurable {

public void configure(JobConf conf) {
}

protected boolean isSplitable(FileSystem fs, Path file) {
return true;
}

public RecordReader<LongWritable, Text> getRecordReader(
InputSplit genericSplit, JobConf job, Reporter reporter)
throws IOException {

reporter.setStatus(genericSplit.toString());

try {
return new JsonInputRecordReader(job, genericSplit);
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return null;
}
}
}
[/sourcecode]

The main purpose of this class is to instance and provide an instance of the org.apache.hadoop.mapred.RecordReader<K,V> which will be the interaction point with Hive. For this interface, I implemented a very simple class that loads the file from the input split metadata and parses it using the rcongiu/Hive-JSON-Serde package parsing capabilities (this is not strictly necessary, as you could use any JSON parser that you prefer and return JSON objects as text, but as I’ve been working with it I used it here as well).

You can see this class implementation below:

[sourcecode language=”java”]
public class JsonInputRecordReader implements RecordReader<LongWritable, Text> {

private JSONArray jsonArray;
private int pos = 0;

public JsonInputRecordReader(Configuration job, InputSplit split)
throws IOException, JSONException {
if (split instanceof FileSplit) {
final Path file = ((FileSplit) split).getPath();
FileSystem fs = file.getFileSystem(job);
FSDataInputStream fileIn = fs.open(file);
try {
this.jsonArray = new JSONArray(fileIn.readLine());
} finally {
fileIn.close();
fs.close();
}
}
}

@Override
public void close() throws IOException {
}

@Override
public LongWritable createKey() {
return new LongWritable();
}

@Override
public Text createValue() {
return new Text();
}

@Override
public long getPos() throws IOException {
return this.pos;
}

@Override
public float getProgress() throws IOException {
return pos / this.jsonArray.length();
}

@Override
public boolean next(LongWritable key, Text value) throws IOException {
if (pos < this.jsonArray.length()) {
key.set(pos);

try {
value.set(this.jsonArray.getJSONObject(pos).toString());
} catch (JSONException e) {
// TODO Auto-generated catch block
e.printStackTrace();
value.set("{ ‘error': ‘" + e.getMessage() + "’ }");
}

pos++;

return true;
}

return false;
}
}
[/sourcecode]

The key points at this class are the ctor() and next() methods. In the ctor(), as you can see, I’m loading the file where the JSON array is contained and give its content to the JSONArray ctor (this class is from the SerDe library and actually parses the input). Internally, the instance is maintaining a index (pos variable) and when Hive ask it for next() record I’m just returning the required element from the JSONArray object I’ve created before. As you can also see the return format is just plain text, so there is no need to use complex JSON parsing libraries at this point – just one that could handle objects within an array and returns them).

Having this in place you can define your table as following:

[sourcecode language=”sql”]
CREATE EXTERNAL TABLE my_table(blogid STRING, data INT, name STRING) ROW FORMAT SERDE ‘org.openx.data.jsonserde.JsonSerDe’
STORED AS INPUTFORMAT ‘org.mk.jsonserde.JsonInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION /my_location
[/sourcecode]

Note: As you can see I’m using a default OUTPUTFORMAT implementation as I don’t want to write results anywhere else. In this case the one I’ve picked up discards the keys as I don’t need them also.

I hope these tips help you in walking your own path to use Hive against custom logs… I’ll let you investigate on the implementation of the OutputFormat in case you want to write results to an specific place :)

Building a custom Single Sign-On module for JAVA web apps

The last few weeks I’ve been working on a project that involved Single Sign-On using JAVA web applications. The main purpose of the project was to create a smooth, simple and configurable library that provides SSO to existing JAVA web apps and integrates as easy as possible. In this post I’ll share useful links that provide you all the information you need to embark in a similar journey and some tips that will help you through the way. Along with these, you will find some code that can be used for specific features you have to implement.

To develop and test the library I used the following platforms, environment and technologies. The developing tool was Eclipse [kepler] on a Windows System, and the server selected to host the apps was Tomcat7. Of course you’ll need a JAVA SDK and a STS (in this case was an ADFS 2.0) to provide the security tokens for authentication. So most of the advice within the article will be related to ADFS.

First of all if you’re new to this world, you can start reading about the protocols commonly involved like WS-FED, SAML1 and SAML2. Below you have a list of links that can be used as starting points for each one of these technologies, providing an overall idea of each one and you can use, following the links within them, to get deeper understanding of what is happening under the hood (as you might be implementing authentication requests, logouts, etc):

The next step is to get a library that provides most of the required functionality and pieces you need to build your own solution over it. I found OpenSAML for JAVA one of the best choices for this (http://shibboleth.net/products/opensaml-java.html). As the library description says it does not provide a full implementation of any of the high level entities as Identity Providers, Service Providers, and/or Advanced Clients which is the focus of this article as we will be talking about clients performing SSO.

Integration and SSO features support on the client side can be achieved by implementing a JAVA filter interface (javax.servlet.Filter) that will intercept requests and inspect whether an user is already authenticated within the service provider, or a token is being posted by the STS as a result of an authentication request done before in behalf of the SP.

From this fact you can deduce that you will need custom logic for three main things:

  • Maintain the credentials of an already authenticated user – more likely by implementing a Principal interface (java.security.Principal)
  • Write custom code to validate a posted token in order to authenticate the user and create the Principal instance (this is where you could rely on the OpenSAML library)
  • Initiate new authentication requests automatically using a custom configured STS (that can be done by just a couple of helper classes)

I will not get into details of how to implement the Principal interface or how to handle the Principal instance after a posted token has successfully been validated, but it is trivial that you will store the list of received claims along with the Principal instance and you will handle and keep this object alive as long as required (within a session, cookie or another persistence mechanism allowing some kind of TTL/expiration).

You could have something like shown below:

[sourcecode language=”java”]
public class ClaimsPrincipal implements Principal, Serializable {
private static final long serialVersionUID = 6348413065734100547L;

protected List<Claim> claims = null;
.
.
.
public List<Claim> getClaims() {
return Collections.unmodifiableList(this.claims);
}
}
[/sourcecode]

And for the logic required to validate the posted token and obtain the claims you can create a TokenValidator class (or whatever) that will rely on the functionality provided by the OpenSAML library. This class can have a unique public method called something like validate() which will return the list of claims that you might need to create the Principal instance signaling an authenticated state. This is the most complex task to do and I will not get into details, but you can find useful resources and samples using the OpenSAML library for JAVA on the web.

This the idea of authenticating and managing an user can be summarized with the following pseudo-code:

[sourcecode language=”java”]
TokenValidator validator = new TokenValidator();
try {
List<Claim> claims = validator.validate(postedToken);
PrincipalStore.store(new ClaimsPrincipal(claims));
}
catch (Exception ex) {
[do something with the exception – ex log]
}
[/sourcecode]

And then you can query the Principal Store to check whether an user is already authenticated when receiving new requests or not:

[sourcecode language=”java”]
boolean isAuthenticated = PrincipalStore.retrieve() instanceof ClaimsPrincipal;
[/sourcecode]

But.. What is the user is not already authenticated? We can handle this situation and redirect the request to the STS to start an authentication process. You can do this using WS-FED or SAML2 protocols. The query string required to perform a WS-FED auth request is pretty simple and you can find many examples on the web, and for the SAML2 protocol the easiest way is to implement a HTTP Redirect Binding (http://en.wikipedia.org/wiki/SAML_2.0#HTTP_Redirect_Binding).

As the payload for this can be a little tricky when dealing with ADFS, below is the template that I successfully used against our ADFS. You can use it as well to build a SAML2 auth request to ADFS:

[sourcecode language=”xml”]
<samlp:AuthnRequest
ID="_85e10b08-0f76-4491-be86-07324727c4ed"
Version="2.0"
IssueInstant="2013-08-16T17:12:49.719Z"
Destination="https://[ADFS-URI]/adfs/ls/"
Consent="urn:oasis:names:tc:SAML:2.0:consent:unspecified"
xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
<Issuer xmlns="urn:oasis:names:tc:SAML:2.0:assertion">

http://[ADFS-URI]/adfs/services/trust

</Issuer>
<Conditions xmlns="urn:oasis:names:tc:SAML:2.0:assertion">
<AudienceRestriction>
<Audience>
[service-provider-urn]
</Audience>
</AudienceRestriction>
</Conditions>
</samlp:AuthnRequest>
[/sourcecode]

As said, with this you can create a valid SAML2 auth request that will be gracefully accepted by ADFS to start a login and post a token back to your service provider. In addition, and as the ADFS can be configured to require signed auth requests, you might need to add signature parameters to your query string (signature algorithm and signature digest). In that case you will need to get a private key from a certificate store in order to perform a rsa-sha1 sign for example.

Here you can see how the code to extract a private key from a keystore could look like:

[sourcecode language=”java”]
public static PrivateKey getPrivateKey(InputStream keyStoreStream, String alias, String password)
throws KeyStoreException, NoSuchAlgorithmException, CertificateException,
FileNotFoundException, IOException, UnrecoverableEntryException {
KeyStore ks = KeyStore.getInstance("pkcs12");
ks.load(keyStoreStream, password.toCharArray());

KeyStore.PrivateKeyEntry pkEntry = (KeyStore.PrivateKeyEntry) ks.getEntry(alias,
(KeyStore.ProtectionParameter) new KeyStore.PasswordProtection(password.toCharArray()));
PrivateKey pk = pkEntry.getPrivateKey();

return pk;
}
[/sourcecode]

I hope this article has been useful to you, and help you a little in the process of implementing a custom SSO module within your JAVA web app. If you have design/implementation questions you can write me for further details :)

Data Driven Apps – State of the Art Techniques

Recently I’ve been working with the Pattern and Practices group in a guide around Data Access focused on high scalability and availability of data driven applications. Through the book we show several DB families while analyzing their advantages and disadvantages when used in real life apps.

The book is still in its final phase, but several drops have been done to the community in order to get early feedback.

If you’re a enthusiastic software developer or an architect looking for the best practices when building a highly scalable data driven application I recommend you to take a look on this Data Access Guidance at CodePlex site.

Keep in touch for future releases!

How to easily unit test MVC routes using Fakes

In this short post I will show you how you can easily test your MVC routes using Fakes as mocking framework. Besides there is a lot of mocking frameworks you can choose to unit test your code, Fakes provides you the ability to mock almost any class within the framework, not just interfaces or virtual things.

The following concepts are first class citizens within the Fakes ecosystem:

  • A stub replaces another class with a small substitute that implements the same interface. To use stubs, you have to design your application so that each component depends only on interfaces, and not on other components.
  • A shim modifies the compiled code of your application at run time so that instead of making a specified method call, it runs the shim code that your test provides. Shims can be used to replace calls to assemblies that you cannot modify, such .NET assemblies.

Basically stubs are what you find in many other mocking frameworks for .NET, but shims provides extra value by allowing you to modify the compiled code (not targeting the classes you want to test, but the classes those ones rely on). You can read more about Fakes and usage here.

Here I’ll show you how you can mock base classes within the ASP.NET pipeline as HttpContextBase, HttpRequestBase and HttpResponseBase in order to set up everything MVC needs to resolve a route.

So you can add a private method to your fixture setting up these mocks like the following one:

[sourcecode language=”csharp”]
private IDisposable MocksForMVCRouteTesting(string requestUrl, out HttpContextBase contextBase)
{
var shimsContext = ShimsContext.Create();
var requestBase = new System.Web.Fakes.StubHttpRequestBase
{
ApplicationPathGet = () => "/",
AppRelativeCurrentExecutionFilePathGet = () => requestUrl
};

contextBase = new System.Web.Fakes.StubHttpContextBase
{
RequestGet = () => requestBase,
ResponseGet = () => new System.Web.Fakes.StubHttpResponseBase()
};

return shimsContext;
}
[/sourcecode]

Ok, now you got your mocks to test the routes, but how to use it? Easy, following the next pattern:

[sourcecode language=”csharp”]
// Act
HttpContextBase context;
using (this.MocksForMVCRouteTesting("{mvc_route_to_text_here}", out context))
{
var routeData = routes.GetRouteData(context);

// Asserts here…
}
[/sourcecode]

And what about a real like example?

[sourcecode language=”csharp”]
// Arrange
var routes = new RouteCollection();
RouteConfig.RegisterRoutes(routes);

// Act
HttpContextBase context;
using (this.MocksForMVCRouteTesting("~/account/register/", out context))
{
var routeData = routes.GetRouteData(context);

// Assert
Assert.IsNotNull(routeData);
Assert.AreEqual("Account", (string)routeData.Values["controller"], true);
Assert.AreEqual("Register", (string)routeData.Values["action"], true);
Assert.IsTrue(string.IsNullOrWhiteSpace(routeData.Values["id"].ToString()));
}
[/sourcecode]

Hope that you can find this useful :).

More NoSql stuff :: Couchbase from C#

Continuing with my incursion in the NoSql database world, these days I’ve been playing with Couchbase installed on Windows 8 (after tackling several issues). Finally, after downloaded and installed it successfully, I’ve created a simple console application in C# to test its basis (as I did in my previous articles ‘Using MongoDB from C#‘ and ‘A little of RavenDB‘ a few days ago).

This turns to be a Document Oriented store working as a key-value DB when interacting with its HTTP RESTful API. It stores JSON documents, and support sharding, replication, and many other things that we can find these days in most NoSql databases. You can go to the Couchbase home site and learn more about it.

As I said before, in order to try some of its features, I’ve created a console application (the client driver is very simple to add to your solution – just using NuGet).

First at all you need to get a configured client instance before interact with the DB. So let’s start by downloading the client NuGet package into your project, and add a new section within your .config file for Couchbase:

[sourcecode language=”xml”]
<configSections>
<section name="couchbase" type="Couchbase.Configuration.CouchbaseClientSection, Couchbase"/>
</configSections>

<couchbase>
<servers bucket="default" bucketPassword="">
<add uri="http://127.0.0.1:8091/pools/default"/>
</servers>
</couchbase>
[/sourcecode]

Now you can get a client instance from your app just this easy:

[sourcecode language=”csharp”]
var client = new CouchbaseClient();
[/sourcecode]

Let’s perform some simple CRUD operations like adding and getting a document. Having the following Book class defined:

[sourcecode language=”csharp”]
[Serializable]
public class Book
{
public string Title { get; set; }
public string[] Authors { get; set; }
public decimal Price { get; set; }
public bool Recommended { get; set; }
}
[/sourcecode]

How do we create and add a new book to the store? Simple, just like the following:

[sourcecode language=”csharp”]
var bookId = "f1637bb1-cd16-40bd-9ae7-e267d58ff62f";
client.StoreJson(StoreMode.Add, bookId, new Book
{
Title = "For Whom the Bell Tolls",
Authors = new string[] { "Ernest Hemingway" },
Recommended = true,
Price = 15.64M
});
[/sourcecode]

In this example I have chosen the key to be a Guid, but you can use any string-set you want to define your own keys domain. I have also used a StoreJson() method that is not present at the CouchbaseClient class interface; well you can take advantage of it by using the Couchbase.Extensions namespace static classes containing some extension methods for this class.

And what’s the code to retrieve? Some kind of GetJson()? Exactly! Actually a GetJson<T>() extension method:

[sourcecode language=”csharp”]
var book = client.GetJson<Book>(bookId);
[/sourcecode]

Note: the CouchbaseClient class contains simple Store() and Get<T>() methods, so why I didn’t use them before? Well, if you use these methods the client will store the document instance as binary within a JSON document in the server. Having a JSON readable document stored in the server will be helpful to perform server-side operations over them (basically MapReduce operations).

But now, how you can retrieve or perform a custom operation, and get its results, over a set of documents? Well, as you may guessed, the answer here (as in many other NoSql databases) is to recall the MapReduce concept I’ve mentioned before. In Couchbase you have a concept that is very similar to relational DB’s Views to resolve this kind of things, and as happens in a RDBMS, it allows you to obtain a subset or full set of documents from the store (besides performing aggregation operations). Views in Couchbase iterate over a specific bucket, and they can be expensive to build if the dataset you selected is big, but once created they are automatically updated whenever data changes (having low performance impact). To store the views this DB has a design document concept (as we can find it in CouchDB – where you can store your views within a design document or run temporary views, resulting in performance degradation as temporary views execute when you sent them to the server – refer to CouchDB technical documentation for more info).

To create a new view you can go to your administrative console at http://localhost:8091/ and click in the Views tab to create a new design document and add some views to it, but I’ll show you how you can achieve this from your client app later.

Let’s first create a view that will be useful to search using book’s titles as I did on previous articles, a BookByTitle view/index. I’ve created a JavascriptFunctions class within my application to contain (as strings) the Javascript functions I need for the MapReduce operations, although you can store them whenever you want. Here it is, containing the mapping function for getting books by title:

[sourcecode language=”csharp”]
public static class JavascriptFunctions
{
public static string Dev_Books_ByTitle_Map
{
get
{
return @"function (doc, meta) {
if (doc.title && doc.authors) {
emit(doc.title, null);
}
}";
}
}
}
[/sourcecode]

As you can see within the function I’m checking for documents that contain a title and authors, as the documents within the bucket can be of any kind and the map function will receive all the documents in the bucket (at least if they contain title and authors there is a big chance that they are books – you can split your documents within different buckets or choose to have a kind of ‘document type‘ identifier).

If you add this view to the server you can use it from the client to get a range of books by title for example, as shown below:

[sourcecode language=”csharp”]
var booksByTitle = client.GetView<Book>("dev_books", "bytitle", true);
foreach (var book in booksByTitle.StartKey("S"))
{
Console.WriteLine(string.Format("* Book retrieved: {0}", book));
}
[/sourcecode]

Using the GetView<T>() method you can instruct the server that you want to get the results from the view (ie. the map or reduce functions results), but setting the last method parameter (shouldLookupDocById) to true you will get the documents either. And there I’m showing the books which titles start with ‘S’ until the end of the collection (remember that you have an index built below).

But, what if I want now to count books by authors as I did in previous articles? I’ll just need to add another view that maps the authors names with their book count from each document, and reduce these results grouping by the key (the author in these case).. as with other NoSql DBs I worked with, obtaining this result is a piece of cake. So let’s add two new ‘functions’ to my JavascriptFunctions class that will map and reduce authors with their books count:

[sourcecode language=”csharp”]
public static class JavascriptFunctions
{
public static string Dev_Books_ByTitle_Map
{
get
{
return @"function (doc, meta) {
if (doc.title && doc.authors) {
emit(doc.title, null);
}
}";
}
}

public static string Dev_Books_AuthorWithBookCount_Map
{
get
{
return @"function (doc, meta) {
if (doc.authors && doc.recommended) {
for (var i in doc.authors) {
emit(doc.authors[i], 1);
}
}
}";
}
}

public static string Dev_Books_AuthorWithBookCount_Reduce
{
get
{
return @"function (keys, values, rereduce) {
if (!rereduce) {
return sum(values);
}
}";
}
}
}
[/sourcecode]

Now I have two new Javascript functions defined at this class: Dev_Books_AuthorWithBookCount (map & reduce versions). If you pay attention to the Map version, you can see that I’m only interested in books that are ‘recommended‘ (an again, to know that they are books I’m querying its authors property). From the client app you can get its results in the following way:

[sourcecode language=”csharp”]
var countPerAuthor = client.GetView("dev_books", "authorwithbookcount").Group(true);
foreach (var row in countPerAuthor)
{
Console.WriteLine("* Author {0} has {1} books", row.Info["key"], row.Info["value"]);
}
[/sourcecode]

Please note that I’m using Group() method from the IView interface there to indicate the server that the reduce function should operate on each one of the keys. You can get more information here (and see there how to handle rereduce parameter – something that I’m not doing here since I don’t expect any).

Well… up to now the client code seems pretty simple in order to get view results, but how can we test it so far? Until now you could have been creating the respective views within the administration console (under the Views tab).

At first glance, it seems the C# SDK doesn’t have an API to manage views. Nevertheless, as the design document that contains views details is just another document within the DB, I’ll show you how you can create with a PUT request. You can achieve the ‘trick‘ easily with the following code that relies on the WebClient class from .NET framework to do the job, but first let’s add the design document’s body (JSON) to the JavascriptFunction class to have it at hand:

[sourcecode language=”csharp”]
public static string Dev_Books_DesignDocument
{
get
{
return @"{""views"":{
""bytitle"":{
""map"":""function (doc, meta) {n if (doc.title && doc.authors) { n emit(doc.title, null); n }n}""
},
""authorwithbookcount"":{
""map"":""function (doc, meta) {n if (doc.authors && doc.recommended) {n for (var i in doc.authors) {n emit(doc.authors[i], 1); n }n }n}"",
""reduce"":""function (keys, values, rereduce) {n if (!rereduce) {n return sum(values); n }n}""}}
}";
}
}
[/sourcecode]

Note: This is a temporary solution to have the design document body, but I recommend you to create a C# class instance that can be mapped to that JSON body and transform it before make the request. That will be a better, and more elegant, way to do this.

Anyway, with the JSON body there we can make the PUT request now:

[sourcecode language=”csharp”]
var url = "http://localhost:8091/couchBase/default/_design/dev_books";
var webRequest = new WebClient();
try
{
webRequest.DownloadData(url);
}
catch (WebException ex)
{
if (ex.Response is HttpWebResponse && ((HttpWebResponse)ex.Response).StatusCode.Equals(HttpStatusCode.NotFound))
{
var data = Encoding.ASCII.GetBytes(JavascriptFunctions.Dev_Books_DesignDocument);
webRequest.Headers.Add("Content-Type", "application/json");
webRequest.UploadData(url, "PUT", data);
}
else
throw;
}
[/sourcecode]

Where are making a PUT request to the default bucket with the _design/dev_books id. If you go to your Views tab within the administration console you should see the design document and its views. At the code I’m trying to get the design document and if doesn’t exist within the DB (404 – not found) I’m creating it.

You can find the sample code here. As usually, I recommend you go to the DB home site, download and try it yourself (you can do it from C#, using any other language found there or simply by using the REST interface the server has).

That’s all folks!!

Installing Couchbase under Windows 8

The following are the workaround steps, that worked for me, when installing Couchbase (Enterprise Edition) under a Windows 8 operating system. After following these steps I’ve been able to get a healthy instance running.

The workarounds in which this article is based can be found online, so here I’m going to keep it simple and just write a list of steps (you can refer to these articles, found at the end of this post, if you need more information). So after install the package you downloaded, a CouchbaseServer service (Windows Service) instance should be running as shown below:

image

Execute the following steps in order to get a working instance (if it’s not working yet):

  1. Stop the service
  2. Edit the your Windows hosts file with Administrator rights
  3. Add the following line: 127.0.0.1    couchbase-node1
  4. Save and close the hosts file
  5. Now relative to the root folder where you installed Couchbase edit the .Serverbinservice_register.bat with Administrator rights
  6. Update the line that is setting the NS_NAME var to be: set NS_NAME=ns_1@couchbase-node1
  7. Update the line next to it to also set NS_NAME=ns_1@couchbase-node1
  8. Save and close the file
  9. From your Couchbase root folder delete the contents of .Servervarlibcouchbasemnesia
  10. Now run the .Serverbinservice_unregister.bat as Administrator (from your root folder)
  11. Next run the .Serverbinservice_register.bat file you edited
  12. At this point if you run the service and check the listening sockets (netstat -a) you should seen ports 8091 & 8092
  13. To run the memcached healthy replace the libtcmalloc_minimal-4.dll (at .Serverbin) with an previous version (this works for me)
  14. And finally, start the service

After all these steps have been performed you should have access to your administrative console under IE (http://localhost:8091/) and see everything running OK as shown below:

image

Related articles:

IP Address related

Allocation library related (article1, article2)

A little of RavenDB

Hi, a few days ago I published an article showing how you can easily use MongoDB under C#. In this article I mainly plan to show the same operations or features, but using RavenDB. As this database is built in C# it will be easier to integrate it a native C# client.

Besides both databases have very similar features, RavenDB supports ACID transactions under the unit or work pattern. For a fully set of features that both databases support, and a comparison, I suggest you to go to their respective home sites and read more about them.

So, in order to try some of its features, I’ve created a console application using RavenDB (after installing the server, the client driver is very simple to add to your solution – just using NuGet).

First at all you need to get a document store instance and configure it before using sessions, after you downloaded the client NuGet package you can do it like this:

[sourcecode language=”csharp”]
var documentStore = new DocumentStore { Url = "http://localhost:8080/" };
documentStore.DefaultDatabase = "mauro";
documentStore.Conventions.FindTypeTagName = (t) => t.Equals(typeof(City)) ? "Cities" : string.Format("{0}s", t.Name);
documentStore.Initialize();
[/sourcecode]

Now you can perform for example a book adding operation. You will have to do it under a session (that will track all your changes on objects, as an ORM does, and commit or discard the whole thing):

[sourcecode language=”csharp”]
using (var session = documentStore.OpenSession())
{
session.Store(new Book
{
Title = "For Whom the Bell Tolls",
Authors = new string[] { "Ernest Hemingway" },
Price = 15.64M
});
session.SaveChanges();
}
[/sourcecode]

Where your Book class is defined as:

[sourcecode language=”csharp”]
public class Book
{
public string Title { get; set; }
public string[] Authors { get; set; }
public decimal Price { get; set; }
}
[/sourcecode]

How we can iterate over all our book collection? Well, as RavenDB is ‘Safe by Default‘ meaning, among other things, that you can NOT do a ‘SELECT * from Books‘ you will need to paginate through your collection or take another approach for custom queries and/or aggregation operations.

For this purpose is that RavenDB uses indexes for everything. And what is an index under this DB? Well, in the most simple concept an index here is a MapReduce operation. If you don’t have an existing, or static in RavenDB language, index for a specific query, the server will create a temporary one for you when needs to resolve a query (I recommend you to read more about at the home site documentation as indexes are the core of the inner working of queries).

So, how we can define a static index to resolve a specific query? This is pretty simple from the C# client and using LINQ. You can set for example this index to query all the books in the book collection (anyway remember always the ‘safe’ behavior):

[sourcecode language=”csharp”]
using (var session = documentStore.OpenSession())
{
if (session.Advanced.DocumentStore.DatabaseCommands.GetIndex("allBooks") == null)
{
session.Advanced.DocumentStore.DatabaseCommands.PutIndex("allBooks", new IndexDefinitionBuilder<Book>
{
Map = documents => documents.Select(entity => new { })
});
}
}
[/sourcecode]

You can choose to create an IndexDefinition with your map/reduce function as strings within it, or use the IndexDefinitionBuilder helper (as here) and make things easier as you can use code-insight help to write the LINQ queries and you have all your domain objects at hand. As we are returning an empty object here for each one of the iterated books we are creating like a ‘mock index’ over the whole collection.

Tip: you can use this index to delete the book collection with the DeleteByIndex() command as it is shown next:

[sourcecode language=”csharp”]
session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex("allBooks", new IndexQuery());
[/sourcecode]

You can create any index you want to this way, just using the PutIndex() command as shown above and the IndexDefinitionBuilder class, but what is the best practice to create an index?

Well, another way to define and create a static index within the DB is to inherit AbstractIndexCreationTask<TDocument> class and set the index properties and behavior within the new class ctor. Let’s use this approach to create an index per book’s title:

[sourcecode language=”csharp”]
public class BooksByTitleIndex : AbstractIndexCreationTask<Book>
{
public BooksByTitleIndex()
{
this.Map = books => books.Select(book => new { Title = book.Title });

//will analyze title to be available on search operations
this.Indexes.Add(x => x.Title, FieldIndexing.Analyzed);
}
}
[/sourcecode]

Besides being the recommended approach to create static indexes, this one is also good for use them later on queries where you can explicitly tell the engine what index should use or leave it to be decided by the query resolver. In the following code you have two queries, the first one does not specify which index should use to resolve, and the second one does, anyway both queries in this case will use the same index: ‘BooksByTitleIndex‘.

Why? Because the resolver sees that your are using the Title property of your class and it has an index for it.

[sourcecode language=”csharp”]
var book1 = session.Query<Book>()
.Where(b => b.Title.Equals("Seven Databases in Seven Weeks"))
.FirstOrDefault();

var book2 = session.Query<Book, BooksByTitleIndex>()
.Where(b => b.Title.Equals("Seven Databases in Seven Weeks"))
.FirstOrDefault();
[/sourcecode]

But how do we create the index in the DB before executing the queries? Well, you can simple instantiate and call the Execute() method for your index, or use a static index creation helper class that takes all types that inherited from the AbstractIndexCreationTask class and creates the indexes for you if they don’t exist (it is recommended to do this when initializing):

[sourcecode language=”csharp”]
// create from index classes
IndexCreation.CreateIndexes(typeof(Program).Assembly, documentStore);
[/sourcecode]

Now let’s say that you want to sum the prices of all books in your collection. Well that’s something that you can very easily do with just a couple of lines in C# using LINQ and workaround the ‘safe’ behavior by retrieving pages of books (or change the ‘safe’ behavior settings – although this is NOT recommended!).

The problem with this? You are transferring all the data to the client, need to paginate over all the books in your collection and it’s bizarre. The cost will be prohibitive if this collection is huge and you will be going against the ‘safe’ behavior the DB has – that will be for sure a very bad design decision.

So let’s create a new index for this using the MapReduce features RavenDB has for us. Again, we are going to define a class for the new index to follow best practices and, in addition, simplify our code after:

[sourcecode language=”csharp”]
public class SumBookPricesIndex: AbstractIndexCreationTask<Book, SumBookPricesIndex.ReduceResult>
{
public class ReduceResult
{
public string SetOfBooks { get; set; }
public decimal SumOfPrices { get; set; }
}

public SumBookPricesIndex()
{
this.Map = books => books.Select(book => new ReduceResult { SetOfBooks = "all", SumOfPrices = book.Price });

this.Reduce = results => results
.GroupBy(result => result.SetOfBooks)
.Select(group => new ReduceResult { SetOfBooks = group.Key, SumOfPrices = group.Sum(groupCount => groupCount.SumOfPrices) });
}
}
[/sourcecode]

Look how a new ReduceResult class is defined within the new index class. This is to make things easier later also, and because as the MapReduce engine could call reduce with mapped or reduced lists you will need to have a logic that can handle both type of items (this is the simplest way). Refer to the documentation for more information.

How do you get the sum of  all the book prices using MapReduce then? It is very simple after you have the index shown above created, just like this:

[sourcecode language=”csharp”]
var sumOfBooksPrices = session
.Query<SumBookPricesIndex.ReduceResult, SumBookPricesIndex>()
.FirstOrDefault();
[/sourcecode]

Here we are also using the ReduceResult class definition to indicate the query the type of the returned objects.

What if for example we want to group the authors in our DB and count their books? The driver does not support the GroupBy() operation (at least using the Query() method exposed by the LINQ interface – but it is supported within the LuceneQuery() advanced method)… So? Just create another index!

[sourcecode language=”csharp”]
public class GroupBookAuthorsIndex : AbstractIndexCreationTask<Book, GroupBookAuthorsIndex.ReduceResult>
{
public class ReduceResult
{
public string Author { get; set; }
public int NumberOfBooks { get; set; }
}

public GroupBookAuthorsIndex()
{
// be aware with linq queries [here SelectMany(authors,new) is not the same as SelectMany(authors).Select(new)]
this.Map = books => books
.SelectMany(book => book.Authors, (book, author) => new ReduceResult { Author = author, NumberOfBooks = 1 });

this.Reduce = results => results
.GroupBy(result => result.Author)
.Select(group => new ReduceResult { Author = group.Key, NumberOfBooks = group.Sum(groupCount => groupCount.NumberOfBooks) });
}
}
[/sourcecode]

To finish this article I’ll show you a cool thing you can do by using RavenDB and taking advantage of the Lucene engine it relies on (in this particular case Lucene.NET): the “Did you mean?” feature (suggestions).

Let’s say that you want to look for a book title but remember just an approximate word or you did a typo when entering your search:

[sourcecode language=”csharp”]
var query = session.Query<Book, BooksByTitleIndex>().Where(book => book.Title.Equals("sells"));
var searchedBook = query.FirstOrDefault();
[/sourcecode]

And you have the following titles in your DB:

  • For Whom the Bell Tolls
  • Cat’s Cradle
  • Slaughterhouse-Five
  • Seven Databases in Seven Weeks

What will be the result (searchedBook variable)? Well, as there is no match for ‘sells‘ the result will be null.

At this point you can show suggestions to the user (using several comparison algorithms, the user’s input and your index’s tokens [BooksByTitleIndex]) calling the Suggest() method from the LinqExtensions class:

[sourcecode language=”csharp”]
if (searchedBook == null)
{
Console.WriteLine("Did you mean:");
foreach (var suggestion in query.Suggest(new SuggestionQuery { Accuracy = 0.4f, MaxSuggestions = 5 }).Suggestions)
{
Console.WriteLine("t. {0}?", suggestion);
}
}
[/sourcecode]

Being the output something like this:

Did you mean:

  • tools?
  • bell?
  • seven?

You can find the sample code here. Anyway, this is just the top of the iceberg, there is a lot of features (document relationships, polymorphic indexes, transformation of results, lazy operations, and many many more not just related to querying) RavenDB offers not used within this article. I recommend you to download the DB and give it a try.

Using MongoDB from C#

Today I’ve been playing a little with MongoDB, and after enjoyed triggering a few commands from Mongo’s console, I decided to download, build and try the C# driver. Build it and use it is pretty straightforward, and in a couple of minutes you can be playing with your local database from a C# console application.

So I created a simple console application, that inserts and gets a couple of books from a DB collection in minutes with just very few simple steps. First at all you need to get a client instance and a server reference from C#, and using the driver you can do it like this:

[sourcecode language=”csharp”]
// connect to localhost and get server
var server = new MongoClient().GetServer();
var db = server.GetDatabase("mauro");

// get book collection and clean it
var books = db.GetCollection("books");
[/sourcecode]

From now on you can start performing operations in your books collection, for example adding a new book:

[sourcecode language=”csharp”]
books.Insert(
new BsonDocument
{
{ "Authors", new BsonArray { "Ernest Hemingway" } },
{ "Title", "For Whom the Bell Tolls" },
{ "Price", 31.53 }
});
[/sourcecode]

To learn about the BsonDocument, BsonArray and other types related to this DB, browse its information here.

If you are wondering why the keys for the JSON object are capitalized, you will find the answer in just a few  seconds, when a C# Book class will be introduced and used with the driver. This is an easy way to work with your domain objects as you would do if you were using an ORM. In this case I decided to follow C# convention when naming class properties (except for the _id used in Mongo) as you can see below:

[sourcecode language=”csharp”]
public class Book
{
public BsonObjectId _id { get; set; }
public string Title { get; set; }
public string[] Authors { get; set; }
// decimal is not well-converted by the driver
public double Price { get; set; }
}
[/sourcecode]

Now that we have a Book class defined we can use it within the driver calls and work with this typed domain object. If you use the db.GetCollection<TDocumentType>() getter when instancing a collection of entities you will tell the driver that you want to wrap the BSON objects (managed under the hood) as TDocumentType objects:

[sourcecode language=”csharp”]
var typedBooks = db.GetCollection<Book>("books");
[/sourcecode]

Now you can make iterate all the elements in this collection; each element returned as a Book instance:

[sourcecode language=”csharp”]
Console.WriteLine("nRetrieving all books (typed collection)");
foreach (var book in typedBooks.FindAll())
{
Console.WriteLine(string.Format("Book retrieved: {0}", book));
}
[/sourcecode]

And, of course, you can also add a new Book instance to the collection in a more easy way that having to deal with ‘JSON’ in C#:

[sourcecode language=”csharp”]
typedBooks.Insert(new Book
{
Title = "Seven Databases in Seven Weeks",
Authors = new string[] { "Eric Redmond", "Jim R. Wilson" },
Price = 35.67
})
[/sourcecode]

At this point all these things are pretty much what any DB can offer (in addition to the ORM features we have from the driver), but what about using a little of MapReduce?

Can I use it from C#? Yes!

Let’s say that you want to sum the prices of all books in your collection. Well that’s something that you can very easily do with just a single line in C# using LINQ:

[sourcecode language=”csharp”]
var allPrices = typedBooks.FindAll().Sum(b => b.Price);
Console.WriteLine(string.Format("Sum of all prices [linq]: ${0}", allPrices));
[/sourcecode]

The problem with this? You are transferring all the data to the client, all the books in your collection. Think what will be the cost if this collection is huge! In addition, this is a simple process that just sum values, but what if the logic would be more complex and require more processing power to obtain the result in a reasonable time?

Fortunately, you can take advantage of the MapReduce engine MongoDB provides, besides being this the correct way to perform this kind of operations (over all your dataset). So let’s try MapReduce from C#.

First at all, I created a static class to put there my Javascript functions (as MongoDB talks Javascript):

[sourcecode language=”csharp”]
public static class JavascriptFunctions
{
public static string MapBuildAllPricesKey
{
get
{
return "function(obj) { return ‘all'; }";
}
}

public static string MapAllPrices
{
get
{
return @"function() { emit(buildAllPricesKey(this), this.Price); }";
}
}

public static string ReduceAllPrices
{
get
{
return @"function(key, values) {
var allPrices = 0;
for (var i=0; i<values.length; i++) {
allPrices += values[i];
}
return { key: key, books: values.length, total: allPrices };
}";
}
}
}
[/sourcecode]

How can I use this functions now?

Well, the MongoCollection provides a MapReduce() method that you can use to instruct the server what are the map and reduce operations you want to perform over a collection.

In addition to just try MapReduce, I’ve also decided to take advantage of the possibiliy to store a function in the server (kind of stored procedure in RDBMS) and, as you can see, I’m planning to use it within the MapAllPrices() js function. Here’s the code to store a function in the server that can be called from other functions you upload:

[sourcecode language=”csharp”]
db.GetCollection("system.js").Save(
new BsonDocument
{
{ "_id", "buildAllPricesKey" },
{ "value", new BsonJavaScript(JavascriptFunctions.MapBuildAllPricesKey) }
});
[/sourcecode]

But how do we finally sum all prices using MapReduce? Easy, just like this:

[sourcecode language=”csharp”]
var mr = books.MapReduce(JavascriptFunctions.MapAllPrices, JavascriptFunctions.ReduceAllPrices);
Console.WriteLine(string.Format("Sum of all prices [map/reduce]: ${0}", mr.GetResults().First()["value"]["total"]));
[/sourcecode]

The mr variable contains the results of your MapReduce call, you can iterate its elements using LINQ (if the result values are many), here we are just taking the First() as we have only one.

You can find the complete sample here.

A touch from the distributed NoSql DB world

Lately I’ve reading the ‘Seven Databases in Seven Weeks’ book, and besides recommending it if you like databases in general, there are a lot of interesting concepts and ideas to learn from. In this post I won’t be talking about the details or good uses of a particular DB or which one to choose to resolve a particular data problem you have, but from something that I enjoyed significantly more while reading: the supporting algorithms, data structures and architectures applied to resolve specific problems.

I enjoy watching how complex things can be resolved with just a simple idea, or a couple of them, and this is the case in the DB world where the integration of all the things is more than the sum of individual parts. So, let’s start with the things that gladly caught my attention:

One of the things I liked most and never gone into detail before, more than just heard its name and vaguely known what it could be used for, are the Bloom filters. I gladly understood how some of the databases analyzed during the book take advantage of them in order to perform fast lookups when need to get a specific value. It is a good structure to manage a not accurate index using significantly less space than an accurate index would use and, with a certain probability and error rate, can tell you if the thing you are looking for is maybe there, or answer that is not there (being 100% sure – i.e. it guarantees no false negatives). One interesting thing about its theory is that you can calculate these probabilities according to how many bits you are using to track things, how many hash functions are used to evaluate each piece of data and how big the set you are indexing is getting (or will be). Using this and having an approximate idea of how your data will grow you can know the number of bits you need to maintain a filter with an error rate below a specific bound.

A good practical example is how this is used within HBase. As this database stores columns in chunks that are split among several servers according to some random sharding schema (decided by the servers), you can know what is the chunk you need to go looking for something but you don’t know it that something could be there or not (here a Bloom filter can avoid you to search at all if it answers that what you are looking for is not there). You can find an interactive example and related stuff here.

Another interesting concept used in many NoSql databases to resolve query and aggregation demands, among others, is the usage of the MapReduce algorithm. Since the idea is very simple to understand, it is a powerful concept and unleashes you to perform whatever logic you want to execute over your distributed and probably huge sets of data in several ways. Its foundations is the divide and conquer principle, where a task (or big problem) is split and treated as subtask (or small problems) returning partial results, and at the end a final step is performed to merge them all (this approach sometimes helps to simplify computational complexity and demanded resources, where both can be analyzed theoretically). Besides, being a very old concept in computer science, the MapReduce new boom is related to how Google (who patented it) uses it to resolve its huge query demands. In this case this is just a particular scenario that happens on nodes within a cluster so it introduces parallelism as well. Each of the participants containing part of the data you need to process, and waiting for the function (or logic) each one must execute (over its owned set) in order to return its results to the caller. An overall view consists of: a first step (map) where you possibly evaluate and convert set of elements to another domain, and a merge step (reduce) that operates with the list of results from different nodes to generate the final result.

A practical example of this can be found on Riak DB where map/reduce function is sent to the cluster and distributed over the participant nodes in the “Riak Ring”. The map function is ran in each of the nodes in the cluster closer to where the data is (without the need to transfer ‘raw’ data through the network). So, besides avoiding network latency issues (by moving big sets of data), you don’t also need high centralized processing power as this architecture evaluates (in parallel) each subset in each individual node. Finally the reduce function is executed after all the partial results from nodes has been collected. You can find more details about how this works here.

A (very) interesting theorem, from theoretical computer science and related to distributed computing, is the CAP theorem. This basically asserts that if you have a distributed system, you can have simultaneously two of the following properties: Consistency, Availability or Partition Tolerance, but not the three of them. Taking into consideration the environment and the many factors that act against a distributed system is not possible to fulfill these three features without falling into the fallacies of distributed computing.

Beyond what the theorem says, it’s more interesting to see how some NoSql databases, that are also distributed, deal with it and allow ‘playing’ with this assertion. If we get Riak into scene again, we will find that this DB has an interesting way of parameterize its inner workings while reading/writing data within the cluster. Riak’s ring architecture distributes, and replicates, data within the nodes in its cluster. This clearly sets the DB as Partition Tolerance but other features are still available for you to play with: Consistency & Availability. Nevertheless, in the case of Riak you will never achieve full consistency if you use it in a real distributed environment (Riak has something that it calls ‘Eventual Consistency’).

This database allows you to freely select the values for: the total of nodes (N) that will participate in replicating data, how many nodes should be written (W) before a successful write is returned, and how many nodes should be read (R) to get the latest value for a key. With these parameters you can for example achieve high availability by setting a high value for N, nevertheless, if what you want to achieve is consistency you can try setting W=N or R=N (leaving the other value R or W to 1), where the first set of settings is called consistency by writes and the last one consistency by reads. You can think that having a high N and one of two choices you can achieve both consistency and availability, but this is not the case as having a high number of replica nodes leads to the following scenario: if you choose to configure consistency by writes, the number of endpoints where an update can hit is bigger leading to inconsistent versions of data for the same key (that the user must resolve); in addition to degrading the performance when writing to multiple nodes. If you try to set consistency by reads you will finally find a similar scenario, showing that you can’t achieve both. I encourage you to read more of how Riak allows you to play with it and its internals.

This is how Riak stands in front of the CAP theorem, and it’s nice to see how tweaking some parameters and playing a little with the cluster configuration, you can set Riak as {c}AP having sometimes (or eventually) consistency. Other databases achieve consistency or availability exclusively, but not both.

There are also another interesting structures used in the web distributed DB world that you can find, and if you like algorithms, data structures and architectures like me, I recommend you to dive into them: inverted indexes, vector clocks, write-ahead logging, distributed file systems, DB sharding, and many more.

Thanks for reading, happy diving!! :)

NHibernate mappings tips using an Alternative Key

Recently I’ve been working with NHibernate, and besides configuring trivial mappings, I had to deal with some not so trivial: when I wanted to link to tables using an alternative key between them. In this short post I’ll resume the necessary steps to do this in case you need to do it too (in a very easy and mechanical way).

Below is an abstraction of the datamodel I had that time (two tables related using an alternative key in the Master):

image

Besides the primary key of each one of the tables (composite for Master, and not defined for Detail), you can see the relation among both tables by the AlternativeKeyQ field (by taking advantage of this unique field you don’t have to propagate the composite PK from Master to the Detail). Also the Detail table has a foreign key to another entity called ForeignKeyN in this case.

So, how do you setup the NHibernate mappings to make it work?

First at all you’ll need to define your domain object for Detail. Of course, you want to have a Master and a ForeignKeyN instances to allow easy navigability from Detail to both entities. So here is your base class defined in C#:

[sourcecode language=”csharp”]
public class Detail
{
public virtual Master Master { get; set; }
public virtual ForeignKeyN ForeignKeyN { get; set; }
}
[/sourcecode]

Now you have two navigation, and entity, properties in your Detail domain object. You need to define your mapping:

[sourcecode language=”xml”]
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="MyNamespace.Detail, MyAssembly" table="Detail" lazy="true">
<many-to-one name="Master" column="AlternativeKeyQ" property-ref="AlternativeKeyQ" class="MyNamespace.Master, MyAssembly" />
<many-to-one name="ForeignKeyN" column="ForeignKeyN" class="MyNamespace.ForeignKeyN, MyAssembly" />
</class>
</hibernate-mapping>
[/sourcecode]

You can see how the relation to Master is defined there, using the ‘property-ref’ attribute we let NHibernate know that we are using that field on the Master table to establish the relation. But if you try to parse this NHibernate mapping within a session you’ll receive an error saying that the entity doesn’t contain an identifier (basically).

So we’re going to fix this by adding an identifier to the entity, as we need to add this field to both (class and mapping) we are going to define it in a way that only NHibernate makes use of it:

[sourcecode language=”csharp”]
public class Detail
{
public virtual Master Master { get; set; }
public virtual ForeignKeyN ForeignKeyN { get; set; }
protected virtual int Id
{
get { return this.Master.AlternativeKey; }
set { // do nothing.. }
}
}
[/sourcecode]

Now we added a protected field to the Detail class that we are going to use as main identifier within the NHibernate mapping (but don’t expose it in the object’s interface – i.e. it won’t be visible to users).

[sourcecode language=”xml”]
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="MyNamespace.Detail, MyAssembly" table="Detail" lazy="true">
<id name="Id" column="AlternativeKeyQ" />
<many-to-one name="Master" column="AlternativeKeyQ" property-ref="AlternativeKeyQ" class="MyNamespace.Master, MyAssembly" insert="false" update="false" />
<many-to-one name="ForeignKeyN" column="ForeignKeyN" class="MyNamespace.ForeignKeyN, MyAssembly" />
</class>
</hibernate-mapping>
[/sourcecode]

As you can see in the mapping, we are using the Id property of the class as the identifier, and mapping it to the AlternativeKeyQ table field. We also disabled insertion or edition (insert=”false” – update=”false”) via the many-to-one Master property as otherwise it will clash with the Id update when NHibernate tries to update the entity.

In the Master mapping to add a bag with the Detail entities you have to use again the ‘property-ref’ attribute to let NHibernate know that you are linking these entities by AlternativeKeyQ field in the Master (if NHibernate requires you to override Equals() and GetHashCode() within your Detail domain object, use your new Id property to compare and return hash values).