Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Friday, May 1, 2009

Implementing Distinct clause in Google Apple Engine Datastore

People coming from RDBMS background used to write typical sql queries involving distinct clause, doing table joins may be highly disappointed the way JDO handles all this. There are fundamental architectural differences between JDO and conventional RDBMS base model layer. Where RDBMS is a collection of logically related physical tables which can be queries or manipulated using sql queries on the other hand JDO persists the entities in the data store. Relations between entities are defined within entity definition to other entities. Based on relation groups these entities are persisted to data store. Google App Engine heavily believes in the philosophy of scalability. That is why, a lot of typical sql like operations are not supported.

So, RDBMS based community with zero or very less experience with JDO may initially take it as a flaw or painful stuff. In practice once you come to the GAE architecture mindset you start appreciating the way these things can be handled in GAE. Ok so let's take the example of implementing a distinct clause in JDO.

JDO Query Language (JDOQL) does not support the use of distinct clause. So, if you have a list of addresses where you have to find all the unique contury names then in RDBMS you will simply write:

"select distinct countryName from Address"

Now, to implement the same in JDO what you should do is you need to create a separate entity class say CountryList. So, whenever you create a new Address entity just check in the CountryList if the countryName already exists? If no then insert it. So, in a way you have already a list of distinct country names. You can simply query CountryList to get the names of distinct countries.

Now, the question comes why it is not supported in JDOQL. The answer is it is not scalable and will perform poorer along with number of Addresses added to the table. So, even in SQL ultimately when you write select distinct ... it behind the scene fetch all the addresses from the table and build a list of distinct countryName. Imagine if you have 1 million recs in your table then how bad the performance of distinct could be. But going the JDO ways that is maintaining a distinct list at the time of Address creation would be a good performance save and scalable as well.

No comments: