Quantcast
Channel: entityframework Discussions Rss Feed
Viewing all articles
Browse latest Browse all 1793

New Post: Query generation in batch insert scenarios

$
0
0

Yesterday I made the first commit to the fork with some initial prototyping of the provider API for batching:

public class DbBatchCommand : IDisposable
{
    public virtual DbTransaction Transaction { get; set; }
    public virtual DbConnection Connection { get; set; }
    public virtual int? CommandTimeout { get; set; }
    public virtual DbParameterCollection Parameters { get; }

    public virtual AddToBatchResult TryAdd(DbCommandTree commandTree,
Dictionary<string, Tuple<TypeUsage, object>> parameterValues, bool hasReader) public DbDataReader Execute() public DbDataReader Execute(CommandBehavior behavior) public Task<DbDataReader> ExecuteAsync(CancellationToken cancellationToken) public Task<DbDataReader> ExecuteAsync(CommandBehavior behavior, CancellationToken cancellationToken) } public enum AddToBatchResult { NotAdded, AddedSameResultSet, AddedDifferentResultSet } abstract class DbProviderServices { public DbBatchCommand StartBatch(); } public interface IDbBatchConfiguration { int MaxUpdateBatchSize { get; } }

The idea behind it is both the provider and the caller can end the batch at any time. And the result of executing the batch is presented as a single DbDataReader that could have a different result set for each of the batched commands or some of the commands grouped in one result set. The commands that would not normally return a DbDataReader should now return the number of rows affected as a row. This allows checking for concurrency issues on an individual command basis while still allowing the provider to batch together different types of commands.

This approach however limits batching to groups of commands that are mutually independent (e.g. don't use server-generated values from previous commands). One way of dealing with this as already mentioned in the thread is using id generation strategies, this could also be prototyped in the fork. Another way is to change the command trees sent to the provider to include the notion of temporary ids that are used in EF internally by doing this the provider could replace them with parameters and still execute all the commands in one batch, but it would break existing providers.

I also included a default provider implementation that creates batches of a single command. And updated the SQL Server provider to do batching by concatenating the commands.

We are interested in measuring how different batching strategies compare when used with EF. Currently the candidates are:

  1. Concatenating CRUD statements (already in the fork)
  2. Grouping sequential inserts or deletes in a single statement
  3. Using Table-Valued parameters to group sequential inserts or updates

Since the last two options only work for certain type of operations a combination of different strategies could be used to achieve the best results.

I am currently still busy with other tasks, so I probably won't be able to work on the prototype in the following two weeks, but will monitor the discussions. Any further feedback, ideas or code contributions would be very helpful at this stage.

Some other open questions include handling output parameters and store procedures that return multiple result sets as well as whether other providers can accommodate the results from a batch into one data reader or should we change the API to accept multiple readers.

Thanks,

Andriy


Viewing all articles
Browse latest Browse all 1793

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>