Hello everyone,
I am involved in a scenario where there is a huge (SQL Server 2005)
production database containing tables that are updated multiple times
per second. End-user reports need to be generated against the data in
this database, and so the powers-that-be came to the conclusion that a
reporting database is necessary in order to offload report processing
from production; of course, this means that data will have to be
replicated to the reporting database. However, we do not need all of
the data in the production database, and perhaps a filtering criteria
can be established where only certain rows are replicated over to the
reporting database as they're inserted (and possibly updated/deleted).
The current though process is that the programmers designing the
queries/reports will know exactly what data they need from production
and be able to modify the replication criteria as needed. For example,
programmer A might write a report where the data he needs can be
expressed in a simple replication criteria for table T where column X
= "WOOD" and column Y = "MAHOGANY". Programmer B might come along a
month later and write a report whose relies on the same table T where
column X = "METAL" and column Z in (12, 24, 36). Programmer B will
have to modify Programmer A's replication criteria in such a way as to
accomodate both reports, in this case something like "Copy rows from
table T where (col X = "WOOD" and col Y = "MAHOGANY") or (col X =
"METAL" and col Z in (12, 24, 36))". The example I gave is really
trivial of course but is sufficient to give you an idea of what the
current thought-process is.
I assume that this is a requirement that many of you may have
encountered in the past and I am wondering what solutions you were
able to come up with. Personally, I believe that the above method is
prone to error (in this case the use of triggers to specify
replication criteria) and I'd much rather use replication services to
copy tables in their entirety. However, this does not seem to be an
option in my case due to the sheer size of certain tables. Is there
anything out there that performs replication based on complex
programmer defined criteria? Are triggers a viable alternative? Any
alternative out-of-the-box solutions?
Any feedback would be appreciated.
Regards!
Anthony
Hello Paul,
Thank you for your response. I will look into SQL Server 2005's
replication and static row level filtering; are there any books or web-
sites you might recommend? I will need to be able to set up and modify
the row-filter criteria programatically, and the reviews @. Barnes and
Noble on "Pro SQL Server 2005 Replication" are pretty dismal.
Regards,
Anthony
On Apr 17, 3:09 pm, "Paul Ibison" <Paul.Ibi...@.Pygmalion.Com> wrote:
> I'd recommend transactional replication with a nosync initialization. This
> is where the initial setup on the reporting server is achieved by using a
> restore of the database and after that, only subsequent changes are sent
> down. If you are using SQL Server 2005, greater concurrency can be achieved
> by using the read committed snapshot isolation level.
> Cheers,
> Paul Ibison SQL Server MVP,www.replicationanswers.com
|||Hello Paul,
On Apr 17, 6:26 pm, "Paul Ibison" <Paul.Ibi...@.Pygmalion.Com> wrote:
> Have a look at Hilary's book for snapshot and transactional, but if you're
> after more merge info and don't like the Pro book then it's really BOL that
> you need and then doing some scenarios for yourself to gain experience. As
> for websites, I have some useful info on the site below and there are other
> articles out on the various SQL Server sites you can get by googling, but
> nothing I think specific to your requirements.
Will do.
> BTW this doesn't really lend itself to modifying the filters dynamically. At
> least this is not as straightforward as you might think. Normally the
> partitions are well designed to start with. If you want something more
> dynamic, then I'd not filter at all in replication and I'd use filters on
> the client application instead.
> Cheers,
> Paul Ibison SQL Server MVP,www.replicationanswers.com
Ahhh... then that's a problem, I'd definitely need the ability to be
able to programatically and dynamically change the filtering criteria
as the need arises, in this case every time a new report is requested
that needs a subset of data not being captured by the replication
process. You would think that this is such a common scenario... Also,
filtering on the client side is not an option either since that would
mean that all of the data would get replicated to the reporting db. I
could have sworn that I read in msdn that the filters could be changed
via stored procs though... I'll have to look that up.
Thanks for your help Paul!
Anthony
|||Anthony Paul wrote:
> I am involved in a scenario where there is a huge (SQL Server 2005)
> production database containing tables that are updated multiple times
> per second. End-user reports need to be generated against the data in
> this database, and so the powers-that-be came to the conclusion that a
> reporting database is necessary in order to offload report processing
> from production; of course, this means that data will have to be
> replicated to the reporting database. However, we do not need all of
> the data in the production database, and perhaps a filtering criteria
> can be established where only certain rows are replicated over to the
> reporting database as they're inserted (and possibly updated/deleted).
> The current though process is that the programmers designing the
> queries/reports will know exactly what data they need from production
> and be able to modify the replication criteria as needed. For example,
> programmer A might write a report where the data he needs can be
> expressed in a simple replication criteria for table T where column X
> = "WOOD" and column Y = "MAHOGANY". Programmer B might come along a
> month later and write a report whose relies on the same table T where
> column X = "METAL" and column Z in (12, 24, 36). Programmer B will
> have to modify Programmer A's replication criteria in such a way as to
> accomodate both reports, in this case something like "Copy rows from
> table T where (col X = "WOOD" and col Y = "MAHOGANY") or (col X =
> "METAL" and col Z in (12, 24, 36))". The example I gave is really
> trivial of course but is sufficient to give you an idea of what the
> current thought-process is.
> I assume that this is a requirement that many of you may have
> encountered in the past and I am wondering what solutions you were
> able to come up with. Personally, I believe that the above method is
> prone to error (in this case the use of triggers to specify
> replication criteria) and I'd much rather use replication services to
> copy tables in their entirety. However, this does not seem to be an
> option in my case due to the sheer size of certain tables. Is there
> anything out there that performs replication based on complex
> programmer defined criteria? Are triggers a viable alternative? Any
> alternative out-of-the-box solutions?
Is it possible to create views, then configure things so that just
those views are replicated as tables on the second server?
|||Hello Ed,
That's a very good idea, if it turns out that the filter isn't
dynamically configurable then perhaps using a view as a filter and
replicating the view can compensate since views can be modified at any
time. However, I doubt that replication can be done on a view rather
than on a table. I'll have to check it out!
Regards,
Anthony
On Apr 17, 9:46 pm, Ed Murphy <emurph...@.socal.rr.com> wrote:
> Anthony Paul wrote:
>
> Is it possible to create views, then configure things so that just
> those views are replicated as tables on the second server... Hide quoted text -
> - Show quoted text -
|||I just finished looking up using indexed views versus a filter for
replication and it turns out that the view is much slower than a
filter (about 3x as slow) because the log reader has to log each
transaction twice, once for the view and once for the table. In my
case performance is of utmost concern so the overhead involved in this
is not something they can live with. Sigh...
>
> - Show quoted text -
|||On Apr 18, 12:24 am, Anthony Paul <anthonypa...@.gmail.com> wrote:
> Hello Paul,
> On Apr 17, 6:26 pm, "Paul Ibison" <Paul.Ibi...@.Pygmalion.Com> wrote:
>
> Will do.
>
> Ahhh... then that's a problem, I'd definitely need the ability to be
> able to programatically and dynamically change the filtering criteria
> as the need arises, in this case every time a new report is requested
> that needs a subset of data not being captured by the replication
> process. You would think that this is such a common scenario... Also,
> filtering on the client side is not an option either since that would
> mean that all of the data would get replicated to the reporting db. I
> could have sworn that I read in msdn that the filters could be changed
> via stored procs though... I'll have to look that up.
> Thanks for your help Paul!
> Anthony
This may sound like a stupid question, but are you sure replicating
the whole database isn't an option? I know you've described the
database as huge, but one mans huge is another mans insignificant (or
the other way around).
It just sounds like you're putting in a lot of work when you may be
able to keep it simple. Apologies if this is a path you've already
worn smooth, just wondering what has made you sure that bog standard
replication isn't the way to go.
Damien
|||Hello Damien,
I would *love* to have a full replication going and not have to worry
about the added complexity of creating dynamic filters or triggers,
but the powers that be simply do not consider it an option. Since I'm
not the one that makes the decisions, I can only go by whatever
options are available. They want ONLY a subset of data to be captured,
nothing more. That would be fine with me if the filter was static (ie.
not subject to change every time a new report is requested) but given
the requirements I am in the same camp as you that a full replication
would be best.
Regards,
Anthony
> This may sound like a stupid question, but are you sure replicating
> the whole database isn't an option? I know you've described the
> database as huge, but one mans huge is another mans insignificant (or
> the other way around).
> It just sounds like you're putting in a lot of work when you may be
> able to keep it simple. Apologies if this is a path you've already
> worn smooth, just wondering what has made you sure that bog standard
> replication isn't the way to go.
> Damien- Hide quoted text -
> - Show quoted text -
|||Anthony Paul wrote:
> I would *love* to have a full replication going and not have to worry
> about the added complexity of creating dynamic filters or triggers,
> but the powers that be simply do not consider it an option. Since I'm
> not the one that makes the decisions, I can only go by whatever
> options are available. They want ONLY a subset of data to be captured,
> nothing more. That would be fine with me if the filter was static (ie.
> not subject to change every time a new report is requested) but given
> the requirements I am in the same camp as you that a full replication
> would be best.
TPTB may start considering it an option if you give them a cost
analysis, depending on whether their previous motivation was "wouldn't
it be nice if" (yes, but) or "we think this is cheaper" (no it isn't)
or "this is required for security reasons" (ugh, okay) or whatever.
|||Anthony Paul (anthonypaulo@.gmail.com) writes:
> I would *love* to have a full replication going and not have to worry
> about the added complexity of creating dynamic filters or triggers,
> but the powers that be simply do not consider it an option. Since I'm
> not the one that makes the decisions, I can only go by whatever
> options are available. They want ONLY a subset of data to be captured,
> nothing more. That would be fine with me if the filter was static (ie.
> not subject to change every time a new report is requested) but given
> the requirements I am in the same camp as you that a full replication
> would be best.
From my meager experience of replication, it seems clear that the database
has to be really huge - several terabytes - to make a dynamic filtering
defensible from a cost perspective. It would be difficult to develop,
difficult to maintain and manage.
The only serious option I see to full replication is a static subset.
That is define what will be supported in replication V1. If a new reqiure-
ment that is not covered, it would have to wait to V2. The idea would
of course to only strip really big stuff with low proability to be included.
And this is what you should tell the powers that be: replicating the entire
database will be far less expensive than changing what is replicated
dynamically.
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment