Unsound and Incomplete: 2013

Saturday, 10 August 2013

Lumia 1020 - cameras and phones considered

I am not particularly interested in photography. I am not a "prosumer". The only camera I own - a compact Lumix with a lowly 8 megapixels - is barely worthy of discussion.

Yet I am quite interested in the Lumia 1020 phone, which combines a great camera with a phone (or so I am led to believe - I have never used one). Despite having no real interest in photography, like many parents with young children, I take quite a lot of photographs. And I don't use my phone for most of them.

Why not, given that I carry my phone everywhere? Simply put - the most disappointing aspect of my iPhone 4S is the mediocrity of the camera. Remember all that rubbish at the launch of the 4S where we were told that "the camera on the iPhone 4S will often be the best camera someone has owned"? Well, that might have been the brief to the camera team in Apple but it wasn't what they delivered. In indoor situations, even in moderately good light, the image quality is awful.

As an example, let's look at a shot I took while on holiday. These images were taken with my iPhone 4S and my Lumix compact, both on the fully automatic modes with no post processing.

iPhone 4S image

Lumix (camera) image

Even just looking at the images you can see how much worse the one from the phone looks. Maybe I could fiddle around in a photo editor to make it look less bad, but the point is to show what the results are like for someone who just wants to take the photos and not get involved in post processing. If I crop both of them, you can see a large amount of noise on the image from the phone.

Cropped version of the iPhone 4S image

Cropped version of the Lumix image

So having conclusively demolished the idea that the iPhone 4S is a suitable device for a novice taking anything but the most transient of photographs, why do I care about the camera on my phone? Well, although my compact camera is not heavy, I do still find it awkward to carry around and transferring the files from it is a pain. The phone has everything in its favour, yet since I might want to keep the photographs for more than a day I am still motivated to put up with the relative inconvenience of the camera. A lot of people may not feel the quality is bad enough to warrant carrying around a camera but that does not necessarily mean they would not welcome a quality camera phone, if other features are acceptable.

I have never really seen any phone manufacturer highlight the camera as much as Nokia, and from what I have read it would seem that the engineering of the hardware is exceptional, and the software is good (although could be better). If I assume without testing it that it meets my requirement in terms of the camera, would I actually buy one?

This review in the Guardian of the Lumia makes an analogy with audio technologies to argue that the trade-off is too great: that people ultimately want native apps (i.e. don't want Windows 8's paucity of apps) therefore won't buy the Lumia. It is true that people were happy to trade sound quality for convenience in the battle that MP3 won over Super Audio CD. But in a world where phone hardware capabilities are becoming more commoditised, the camera is a differentiator that many will welcome, and the trade off is getting more complex to evaluate. What things are people really willing to trade in terms of the software platform to get a decent camera?

I use several Google services - mail, search, maps, blogger. I store my photos on Flickr. The only Microsoft consumer service I use is SkyDrive (after much consideration). I definitely would not move to Windows Phone if it forced me to use Outlook.com, Bing and so on. But which of the many apps on my iPhone are really key to me? Gmail, Flipboard, Twitter, Kindle, RSA Securid (for VPN), SkyDrive, Spotify and Chrome. I have some others but not in my must-have list - such as my local taxi firm and Peppa Pig to keep my daughter amused from time to time.

The most important app is Chrome. Not just because I use it everywhere and my bookmarks follow me, but because it is the app that is actually becoming a platform (although to be fair it's not quite there yet on mobile devices in terms of a store). We can all see that as the capabilities of Chrome as a platform and abstraction layer increase we will see more apps move from native to Chrome. We are already seeing UI sophistication growing rapidly for HTML mobile apps; by cementing that with a richer platform, the operating system on the phone will be less and less relevant.

So there we have it: my decision about whether to buy the Lumia 1020 Windows Phone rests with Google - will they port Chrome to Windows Phone 8?

Saturday, 3 August 2013

MongoDB and the Single View of the Customer

There was quite a lot of publicity generated a few weeks ago by 10Gen, the commercial organisation behind MongoDB, about how one of their major financial services customers (MetLife) was using their product to create an application that provides a "360 degree view of the client". I have been leading a programme that created data services that have been used to satisfy similar use cases for nearly three years and wanted to share my thoughts on the 10Gen approach. Note that I have absolutely no knowledge of the MetLife solution beyond what I have read in the 10Gen press releases and in the more technical webinar.

I should state at the outset that I think MongoDB is potentially a very interesting technology to use to create a data service (which is what I believe this problem is really about), for reasons that I will go into later in this post. I do think however that there are several challenges and considerations not really discussed in the webinar that anyone doing this type of thing really needs to think hard about up front.

In case you haven't watched the webinar, the key reasons cited in the presentation to support MongoDB being a good choice are:

The lack of strongly enforced schema makes it more agile therefore delivery is faster and cheaper
Performance compared with relational databases is much higher (reduced latency as well as being inherently more scalable with built in automatic sharding)
Rich querying (compared with nosql solutions that provide a basic key/value interface)
Aggregation framework allows roll-ups to be done in the database (again compared with other nosql solutions that may push more work to the client)
Map/Reduce either natively or connected to Hadoop to support offline analysis of the data

Managing Expectations

The webinar states that the first release was to support the application used by the contact centre staff but that the intention is that other applications will use the data service (including client facing web apps). This is a key point - I am sure that the business sponsors will now have an expectation that IT can now just hook up different applications with minimal cost and time. And that existing applications will continue to work.

With any data service, the first release is by far the simplest release. You can change anything without worrying about the impact on existing applications. However it is a very different story as soon as you have multiple different applications consuming the data. They will all likely have their own release schedules and budgets therefore they won't want to do releases at the same time. Unless you are working in an organisation that is extremely agile this is going to be a big issue if you don't plan for it.

Build a Solid Foundation

For the reason noted above, I would always recommend building your first release for more than one application - a single application may skew the use case and your data model will get better scrutiny. Once you have applications in production, making any non-evolutionary (i.e. breaking) changes to the data model becomes more and more expensive as the number of consumers grows.

The webinar only briefly mentions the technology that sits on top of the database (I read into it that it was an XML service but in any case there will be some kind of typed model). Whatever the format of the data, it is critical that there is a very detailed data dictionary and that the schema is documented with samples. Like most complex domains, there are many different but valid ways of modelling common concepts in financial services therefore good samples need to be provided.

This to some extent makes the first claimed benefit of Mongo an entirely moot point - lack of enforced schema is really not an advantage when writing an enterprise data service because change in such an environment must be planned and controlled among different groups. (Writing a data tier for a single application would be a different story entirely). You do however want to have tooling that can help manage the change and evolution of the schema (things I will discuss in future posts).

Canonical Model

The other point in the webinar that is in my view short-sighted is that they appear to conform very little of the data (at least that is the impression they give). Effectively they serve data in the format staged from source systems. This is not a unique approach - I have seen several applications built using data from the staging area of a data warehouse (mostly tactical I should add). There are a few obvious problems with this approach:

It forces the burden of interpreting the different source system structures and conventions onto the consumers of the data. At that point your consumers will likely question what value the enterprise data service is adding.
If a major data source is upgraded resulting in changes to its data model then consumers have to deal with that.
Unless you conform the data, it becomes harder to aggregate it without introducing errors, or display it unambiguously to clients. Some simple examples: are the prices for bonds clean or dirty? Are taxes and fees split out or consolidated? Trade or settlement date accounting? How are backdated corrections handled?

The last point is particularly important if you are intending displaying data to clients directly on a website. But it is also very relevant to anyone building any kind of MI platform especially if the users do not understand (or do not want to have to understand) the intricacies of each data source.

Performance

The claims about performance and scalability are made without mentioning competitor products so it's hard to comment on them. While the established relational databases generally do not have a scale out model or have only recently introduced an in-memory model (e.g. SQL Server 2014 CTP aka Hekaton), I would be interested in comparing the performance with an in-memory, scale-out relational database such as the outstanding VoltDB.

While the querying cited as an advantage is certainly rich compared with other NoSQL solutions I have used, the example in the webinar is extremely simple and does not really attempt to show how it compares with a relational solution. Claims about performance would be far more credible if they used a complex example involving filtering, grouping and aggregation.

Simplicity

The most obvious advantage, in my view, of the MongoDB solution would be the vastly reduced complexity of constructing an XML representation of the data. This is clearly something that hierarchical databases are well-suited for. (Although hierarchical databases are not new - MUMPS anyone - and come with their own set of limitations). The webinar does not really highlight this aspect as I recall.

Conclusion

I think there is a great deal of mileage in exploring how a hierarchical (or document) database could be used in an enterprise data service, but I don't think the 10Gen webinar covers a lot of the areas that people with the battle scars of implementing such a service would identify with. In particular, unless you are taking an extremely short-term view, the schema flexibility is not a major bonus for this kind of application.

There are several other areas anyone embarking on this kind of project needs to think about carefully to ensure long-term success and lower cost of ownership and I will blog about these in future: e.g. schema and service evolution, test automation of data acquisition (ETL) processes (most of the well known tools offer very little to support this essential part of modern software development), and supporting 24x7 operation including systems upgrades in a batch environment.

Sunday, 30 June 2013

Locking, catalogue tables and index rebuilds

This isn't meant to be a blog devoted to SQL Server, but since I manage a relatively complex SQL Server application I come across a lot of interesting things relating to the product and building applications with it. This particular example came up in production but as usual I have created a simpler example based on AdventureWorks 2012.

To give some context, the real application from which this is example derived needs to load quite a lot of data as quickly as possible every night. The strategy for that is to partition switch in the data, and to speed up the load into the stage table we disable all the indexes prior to inserting the rows. (Then we re-enable the indexes before partition switching - when partition switching the source table must match the destination exactly). Not only does this ensure that the data is loaded very quickly, but it also leaves no fragmentation.

This approach was used in a number of place therefore a generic stored proc was written which queries sys.indexes to determine the set of indexes either to be rebuilt or disabled:

CREATE PROC DYNAMIC_REBUILD
(
@TABLE_NAME varchar(255),
@ACTION varchar(50)
)
AS
BEGIN
DECLARE @cmd varchar(8000)
SELECT @cmd = COALESCE(@cmd, '') +
' ALTER INDEX [' + i.name +
'] ON ' + @TABLE_NAME + ' ' +
@ACTION + ';' + CHAR(13) + Char(10)
FROM
sys.indexes i
INNER JOIN
sys.objects o
ON i.object_id = o.object_id
WHERE
OBJECT_SCHEMA_NAME(o.object_id) + '.' + OBJECT_NAME(o.object_id) =
@TABLE_NAME
AND o.type_desc = 'USER_TABLE'
AND i.index_id > 1
exec(@cmd)
END

The issue I am going to describe in detail is contained in that proc, but it wasn't obvious to me or the developer who had written the proc.

During the course of the overnight batch process, the proc was called by multiple times, often in parallel (but targeting different tables of course) because of parallel data loads. The first issue was that we experienced deadlocks nearly every day. The deadlock was on sysschobjs, which was initially surprising, although the proc was called within a transaction. That is easily resolved by adding a nolock hint, but in hindsight this should have given a big indication of the underlying problem with the proc.

The more interesting issue is that we found that executions blocked each other. To see this for yourself, create the proc above on AdventureWorks (making sure you add the nolock hint against both sys.indexes and sys.objects). Then in two separate sessions execute:

BEGIN TRAN
exec DYNAMIC_REBUILD @TABLE_NAME='Sales.SalesOrderDetail',
@ACTION='REBUILD'

and

BEGIN TRAN

exec DYNAMIC_REBUILD @TABLE_NAME='Sales.Customer',

@ACTION='REBUILD'

You should see the second session blocks. Let's look at what it is blocking on by executing this query:

SELECT blockee.wait_type, blockee.wait_resource, *

FROM sys.dm_exec_sessions blocker

INNER JOIN sys.dm_exec_requests blockee

ON blocker.session_id = blockee.blocking_session_id

This should return something like this:

wait_type wait_resource

LCK_M_SCH_S OBJECT: 7:1154103152:0

So, the second session is waiting for a schema stability lock (Sch-S) but on which table?

SELECT OBJECT_NAME(1154103152)

returns: SalesOrderDetail

How odd - the proc passed the argument to rebuild the Sales.Customer table is waiting for a Sch-S lock on the table whose indexes were rebuilt by the first proc. Just for completeness, let's check the locks held by the first session:

SELECT request_mode, request_status

FROM sys.dm_tran_locks

WHERE request_session_id=51

AND resource_associated_entity_id = 1154103152

returns:

request_mode request_status

Sch-M GRANT

No surprise really since you would expect a schema modification lock to be held by an index rebuild, and Sch-S is incompatible with Sch-M so that causes blocking. The question is why the query attempts to acquire a Sch-S lock? I did not know when I faced this, and reading the MSDN documentation on locks does not give any clues (at least not to me).

In order to figure out what was going I used extended events. (There are several good online resources to help if you aren't familiar with using them, but I would particularly recommend John Huang's site and Jonathan Keyahias's blog from which I learned a lot). My objective was to understand which locks and in what order were being acquired and released by the query in the proc. I created a session as follows:

CREATE EVENT SESSION understand_locking

ON SERVER

ADD EVENT sqlserver.lock_acquired

(

ACTION (sqlserver.session_id)

ADD EVENT sqlserver.lock_released

(

ACTION (sqlserver.session_id)

)

ADD TARGET package0.asynchronous_file_target

(

SET filename = 'c:\dev\xelog.xel', metadatafile = 'c:\dev\xelog.xem'

)

This just adds the SPID to the set of standard fields that are recorded. You can see the standard fields by executing this query:

SELECT *

FROM sys.dm_xe_object_columns

WHERE object_name = 'lock_acquired'

AND column_type = 'data'

After starting the session (you can do this interactively in SSMS), and executing the procedures you will have a file of event data that you need to process. To do that, you have to do some XML processing. You can use SSMS to examine the results in order to get a feel for the structure (I could not find an xsd) - the following query is a reasonable template for how to process the data:

DECLARE @Temp table (ID int identity(1,1), EventData xml);

INSERT INTO @Temp(EventData)

SELECT cast(event_data as xml) event_data

FROM

sys.fn_xe_file_target_read_file ('c:\dev\xelog*', 'c:\dev\xelog.xem', null, null) a;

SELECT OBJECT_NAME(ObjectID), * FROM

(

select ID,

cast(EventName as varchar(20)) as EventName,

cast(SessionID as varchar(20)) as SessionID,

cast(resource_type as varchar(10)) as ResourceType,

cast(object_id as varchar(16)) as ObjectID,

cast(mode as varchar(10)) Mode,

cast(owner_type as varchar(12)) OwnerType,

cast(database_id as int) DBID,

cast(resource_0 as varchar(16)) Res0,

cast(resource_1 as varchar(16)) Res1,

cast(resource_2 as varchar(16)) Res2,

cast(lockspace_workspace_id as varchar(20)) lockspace_workspace_id,

cast(lockspace_sub_id as varchar(10)) lockspace_sub_id,

cast(lockspace_nest_id as varchar(10)) lockspace_nest_id,

cast(transaction_id as int) TransID,

EventTime

FROM

(

SELECT

a.ID,

xml.value('../@name', 'varchar(128)') EventName,

xml.value('../@timestamp', 'datetime') EventTime,

xml.value('(../action/value)[1]', 'varchar(20)') SessionID,

xml.value('@name','varchar(128)') [Col],

CASE

WHEN isnull(rtrim(xml.value('(./text)[1]', 'varchar(128)')), '') = '' then xml.value('(.)[1]', 'varchar(128)')

ELSE xml.value('(./text)[1]', 'varchar(128)')

END [Value]

FROM

@Temp a

CROSS APPLY

a.EventData.nodes('//data') ed(xml)

) x

PIVOT

(

MAX([Value]) FOR Col IN

( resource_type, object_id, mode, owner_type, transaction_id, database_id,

lockspace_workspace_id, lockspace_sub_id, lockspace_nest_id, resource_0, resource_1,

resource_2)

) pvt

) y

WHERE

(Mode='SCH_M' OR Mode='SCH_S')

This might look a bit daunting but all it does it extract the values from the XML, then pivot it so that you get the data in a single row per event rather than multiple rows. The result looks like this (I have removed columns to make it readable on this blog):

ObjectName EventName SessionID

CountryRegionCurrency lock_acquired 55

CountryRegionCurrency lock_released 55

FK_Product_UnitMeasure_WeightUnitMeasureCode lock_acquired 55

Product lock_acquired 55

FK_Product_UnitMeasure_WeightUnitMeasureCode lock_released 55

Product lock_released 55

From the events, it is clear that it is acquiring and releasing Sch-S locks for a large number of tables in the database. At this point it is time to look at the query plan for the query in the proc that builds the SQL.

The highlighted section shows that there is an index scan on sysschobjs. I haven't read it in any official documentation, but I can only assume that when reading from the catalogue tables that, in addition to shared locks, schema stability locks are acquired too. Reviewing the query, the following bit provokes the scan:

WHERE OBJECT_SCHEMA_NAME(o.object_id) + '.' + OBJECT_NAME(o.object_id) = @TABLE_NAME

And it can easily be rewritten as:

WHERE o.object_id = OBJECT_ID(@TABLE_NAME)

Reviewing the modified query plan, we now see index seeks:

If we rerun the procs in two sessions they don't block each other, and analysis of the extended events shows that Sch-S locks are not acquired on all the tables.

In the end, this was a fairly sraightforward solution - optimising a poor query. However the thing I learned was that you need to pay attention to queries that access the catalogue tables because they can acquire Sch-S locks and lead to contention in some circumstances.

Monday, 10 June 2013

Table variables and the optimiser

The MSDN documentation is very clear about the limitations of table variables in SQL Server, but their convenience does mean that they are quite tempting to use. I recently had to look at a performance issue which turned out to be related to their use. I thought I would share a similar example here, using the AdventureWorks2012 database.

The example is contrived but illustrates the key point. Consider a stored procedure that populates a table variable with rows identifying SalesOrderDetails. So we would have:

DECLARE @TAB AS TABLE
(
SalesOrderId int not null,
SalesOrderDetailId int not null
PRIMARY KEY CLUSTERED (SalesOrderId, SalesOrderDetailId)
)

In a real world example there would be non-trivial logic to populate that table, but in this example let's just take 3000 rows:

INSERT INTO @TAB
SELECT TOP 3000 SalesOrderId, SalesOrderDetailId
FROM Sales.SalesOrderDetail

Finally, the stored procedure returns the matching rows from the SalesOrderDetail table by joining to the table variable:

SELECT sd.* FROM @TAB t
INNER JOIN Sales.SalesOrderDetail sd
ON t.SalesOrderId = sd.SalesOrderId
AND t.SalesOrderDetailId = sd.SalesOrderDetailID

All pretty simple? Looking at the query plan we have:

At first glance this look ok. No table scans, no index scans just a clustered index seek. However if you look more closely, you can see it is doing a nested loop join. If you look at the actual and estimate I/O statistics there is a very different picture:

As you can see, highlighted in red, the estimated number of rows in the table variable is 1 however it actually contains 3000 rows. If you look at the I/O statistics, since this means 3000 index seeks, we have:

Table 'SalesOrderDetail'. Scan count 0, logical reads 9000, physical reads 0....

The problem is that, as documented, the optimiser always assumes a table variable contains a single row. In this case, it results in a suboptimal plan since a hash or merge join would be significantly more efficient. If we convert the above to use a temp table instead, the following plan is used:

This results in significantly reduced I/O:

Table 'SalesOrderDetail'. Scan count 1, logical reads 1226, physical reads 0....

Table variables can seem appealing since working with temp tables is more awkward. If you are trying to write modular code, the fact that you can create types and instantiate tables from them enables you to avoid having table definitions scattered throughout your procedures. It also makes it easier to pass data to other procedures by using arguments rather than relying that a suitably named temp table already exists. However you need to be very careful about the performance cost. The real world example from which this is derived took about 350 seconds to run with table variables, and that was reduced to about 40 just by changing to temp tables.

It would be useful if you could instantiate temp tables from table types, rather than just table variables. Or alternatively, improve table variables so that they have far fewer limitations and can be used almost interchangably with temp tables. (Even just allowing estimated cardinality to be specified would be a good first step).

Pages

Saturday, 10 August 2013

Lumia 1020 - cameras and phones considered

Saturday, 3 August 2013

MongoDB and the Single View of the Customer

Managing Expectations

Build a Solid Foundation

Canonical Model

Performance

Simplicity

Conclusion

Sunday, 30 June 2013

Locking, catalogue tables and index rebuilds

Monday, 10 June 2013

Table variables and the optimiser