Generic Data Access Repositories
As I start to use Entity Framework more, and think more about unit testing scenarios, I've been thinking a lot about "generic repositories".
For those that aren't aware of the term, there are lots of articles out there already. Code 52 posted a series recently about them, for example.
I've never been able to wrap my head around generic repositories. I understand that a repository layer (between your business logic and the data persistence layer) is a good idea and helps immensely with testing, but a generic repository doesn't seem like the right solution to me.
Let's look at the two main examples of generic repositories I've seen.
One Repository Per Entity
First there's the "one repository per entity" type, which looks something like this:
public class Repository<T>
{
Repository(DbContext context)
{
_context = context;
}
DbContext _context;
public T Get(object key)
{
return _context.Set<T>().Find(key);
}
public IEnumerable<T> GetAll()
{
return _context.Set<T>().ToList();
}
... etc ...
}
So the idea here is that you are able to define a Repository<Product>
and a Repository<Customer>
etc etc. One for each entity. It might be an interface instead of a class, or it might work on an IUnitOfWork
interface rather than directly on a DbContext
. You get the idea.
My problem with this approach is when you go up a layer and look at your business logic. Let's say we have an application service that needs to read from both customers and products (maybe a price list service). I like to be explicit with my dependencies so I would surface them as constructor parameters, like this:
public class PriceListService
{
public PriceListService(Repository<Customer> customers, Repository<Product> products)
{
...
}
}
This is nice and explicit. I can read that constructor and know straight away that this service needs access to those two entities. But is it Jimmy-proof? What's stopping a developer from passing in a product repository that talks to a live database and a customer repository that talks to a staging database? I suppose CQRS proponents would say that that's exactly why you'd take this approach - because the two repositories should be allowed to live in separate databases. I understand that argument, but in a traditional transactional system where everything lives in one database, this approach scares me.
One Repository Per Session
The second approach to generic repositories is the "one repository per session" style, like this:
public class Repository
{
Repository(DbContext context)
{
_context = context;
}
DbContext _context;
public T Get<T>(object key)
{
return _context.Set<T>().Find(key);
}
public IEnumerable<T> GetAll<T>()
{
return _context.Set<T>().ToList();
}
... etc ...
}
You'll note that in this case the class itself isn't generic - the methods are. That means you only have one repository for your database (which addresses my previous concern nicely) and your price list service would look like this:
public class PriceListService
{
public PriceListService(Repository repo)
{
...
}
}
You'd query the individual entities by calling repo.GetAll<Product>()
.
This is more Jimmy-proof, since Jimmy can't accidentally spin up more than one database connection and pass several of them into our service.
However, you'll remember that earlier in this post I said:
I like to be explicit with my dependencies
Where does this approach leave us in terms of explicit dependencies? All I know is that the price list service needs access to my database. For all I know, it could be writing to orders or reading from inventory. There's nothing explicit about the dependencies at all.
The Non-Generic Approach
Here's what I've been toying with as an antidote to generic repositories:
public interface IPriceListRepository
{
IEnumerable<Product> GetProducts(IQuery<Product> query);
IEnumerable<Customer> GetCustomers(IQuery<Customer> query);
PriceList AddPriceList(PriceList list);
PriceList RemovePriceList(PriceList list);
}
(I'm using the "query object" pattern here to query the products and customers in an open way, but that's by the by.)
This is a very explicit interface. It surfaces only the bits that my price list service will need. My service now looks like this:
public class PriceListService
{
public PriceListService(IPriceListRepository repo)
{
...
}
}
This approach has its drawbacks. It's more code to maintain, since you have one repository per service. I'm sure there are other problems with it too, and I invite you to let me know those in the comments.
What do you think? Which approach (of these three, or something new) do you take, and why? Are you aware of the drawbacks? Do you care?
Trackbacks
- Dew Drop – July 30, 2012 (#1,374) | Alvin Ashcraft's Morning Dew | http://www.alvinashcraft.com/2012/07/30/dew-drop-july-30-2012-1374/
No new comments are allowed on this post.
Comments
Cam
My usual preference is for the second option, as some classes can have a way of having their constructor arguments explode, so I like to keep the list small.
But why not have both? Why not have a repository factory that creates repositories per entry?
In the first example what's to stop Jimmy from creating repositories linking to different DBs? Are you using an IoC to inject these constructor parameters, or are you doing them by hand? Are you performing code reviews, in which case Jimmy's db faux pas would be picked up then?
I think given the extra amount of coding required to make the 3rd solution work, as well as people remembering that that's the pattern to use anyway, it's easier to manage this sort of thing by doing one of the other options with an IoC. Even if Jimmy tries to new up the service manually he has to get the repositories somehow, and that's usually from the IoC anyway.
Damian Brady
I've been trying to come up with the right solution for a while (with the help of other SSW guys). The latest incarnation is my favourite, and it matches your first example fairly closely.
I appreciate the possibility of having repositories against two different contexts, however I'm not convinced that's an issue. Because it's the context (or Unit of Work) that should do any saving, you need explicit knowledge of that context to persist any of your changes.
When I implement this pattern, rather than passing repository concretions into the constructor, I prefer to let a DI framework do the work and only pass in interfaces; both for the repositories and for the context used to instantiate the repository and do the saving later.
Matt Hamilton
Of the two generic examples, I definitely prefer the second one.
And yeah, all my apps are wired up with IoC, and I trust myself not to make the mistake.
I guess the main theme in this post is generic repositories, defensive coding and exlpicit dependencies, and how the three concepts can (or can't) work together.
Colin Scott
I've actually moved fairly firmly away from Repositories in general. ORMs are an abstraction, I find that adding an additional abstraction on top adds little value and tends to cover over useful ORM capabilities.
I'm much more likely to either encapsulate complex queries in a query object which exposes ORM abstractions. This is a POCO that you can new up and which can have necessary data set on properties. By returning ORM abstractions you can use it to hold a complex part of the query without removing your ability to compose it by adding additional restrictions. So the paging etc can still be a responsibility of the client. You can do this with a repository but it tends to bundle different queries together simply because they're queries. I also find it less composable.
I really question the value of a method that only does "return _context.Set().Find(key);". Why can't the client use the context directly? Infrastructure to support this in a test is trivial to build and with many ORMs you can even test against an in memory database. You might not want the business logic proper to be aware of the ORM but hiding it from the co-ordinating logic (controller, service, whatever) doesn't seem to add any value.
I'm not an Entity Framework user specifically (I have used Linq2SQL in the deep and distant past and I'm not going back). In many cases I'd suggest that complex queries may be better served by a micro-ORM (I'm quite fond of Dapper). That leaves the main ORM to pull out aggregate roots by key. (Standard "this approach is not suitable for all systems" caveats apply. Offer void where prohibited).
Colin Scott
(continued due to comment length limits)
I suppose my concern really is (from painful experience) that if you have to spend a lot of effort hiding that your tool is there then maybe you're either not using the tool correctly or the tool is a bad fit. And in general I find repositories to fall on the side of attempting to hide an ORM rather than encapsulate the complexity of specifying a query.
I do actually have a scenario at the moment where I hide the data store (RavenDB in this case) behind an abstraction. That's because for these particular uses the code may run in environments (iOS via MonoTouch for instance) where the data store doesn't run. This is for a limited subset and the more complex queries will use the data store abstractions directly. I just don't think it should be the default approach.
Tudor
Entity Framework (as many other O/RMs) do implement the repository pattern - that's why it doesn't make too much sense to build an repository on top of it.
Many people forget that an O/RM is NOT some ADO.NET replacement, is an abstraction build on top of it. A simple read of Fowler's book will make much clear what a repository is and when somebody has to build it's own (which is not the case when using EF).
Koen Metsu
I used this implementation of your first generic option in one of our projects. The use of an IOC container and classes like the databasefactory avoided the possibility of someone mixing up the staging and production db for the different repositories. In general, this approach was very comfortable for the developers to work with due to it's flexibility.