CodePlexProject Hosting for Open Source Software

Ok, after long delay I've finally found some time to deal with mssql10 spatial extension. After 8 hours of study I've recognized, that M$ surprisingly went in totaly different way with storage format, surprisingly seems that this way is better than what
we know from traditional GIS suppliers and super open OGC.

When I saw storage format for sqlserver10, first word that gets into my mind was "performance". Sql server 10 has only very primitive spatial indexing method and there is no precomputed topology, so geometry storage format must be far more efficient
than what we allready know from others. What I found, seems quite good to me.

**Here is base binary storage sequence/format for all sqlserver geometry types (all in NDR - little endian encoding):
**

- int32 SRID //geom SRID
- int16 coordListTypeEnum //type of coordinate list, see source code
- fixedcoordList or variableSizeCoordList //array of all coordinates for this geometry
- //optional part
- double[] Z or ZM // array of Z values for geometry, for point also M value (for example for rotation angle)
- appendixWithTopologyDef //geometry topology definition for linestring, polygon and multi geometry types, quite mysterious chunk of data...

ok, thats basically all. Lets go deeper and look at some simple example

select geometry::STGeomFromText('POINT(1 1)', 0);

gives:

0x00000000 010C 000000000000F03F 000000000000F03F

easy to understand, now lets have more complex example:

select geometry::STLineFromText('LINESTRING (0 0, 10 0, 10 10, 0 10, 0 0, 2 2, 2 4, 4 4, 4 2, 2 2 )', 0);

gives:

--0x00000000 0104 0A000000 0000000000000000 0000000000000000 0000000000002440 0000000000000000 0000000000002440 0000000000002440 0000000000000000 0000000000002440 0000000000000000 0000000000000000 0000000000000040 0000000000000040
0000000000000040 0000000000001040 0000000000001040 0000000000001040 0000000000001040 0000000000000040 0000000000000040 0000000000000040 01000000 01 00000000 01000000 FFFFFFFF 00000000 02

hmhm,....seems quite complicated, but it is not that bad. After SRID there is coordlist type 0401 (in NDR 0104). 401 is short coordinate list.

Anyway this sample geometry is small and sqlserver topology probably parses it in different way than long coordinate list (see source code, coordinate list type enumerator for details).

Next value is total number of points in coordinate list (int32), followed by all points (variable size coordinate list). As mssql topology is 2D only, all points are also saved as 2D (x,y pair). The only difference between geometry and geography is right here.
Geometry saves given x,y sequence as x,y. Geography saves x,y as y,x.

After end of this list, begins some mystery, that I dont understand well but it is quite easy to read still:

01000000 is total number of figures in this geometry graph (simple linestring has only one figure), this is fine.

from 01 over FFFFFFFF to 00000000 is magic....

and last byte value represents domain geometry type (point, linear, surface), here 02 = linear (01 = point, 03 = surface)

Current version of MSSQL10.GeometryFactory can handle point and line only, I have to read the magic part better to finish all (any voluntiers ??) ;-)) Please see files inside this cs project for more details, code is selfexplained enough...

Ok what is motivation for this work? again performance & compatibility.

Good geometry renderer needs to access directly array of bytes of coordinates to do its tricks and should not rely on exposed (slow) functions of Geometry, such as STPointN() etc. .NET procedure call overhead is sometimes hugh.

When U have your own geometry rendering model (as I have) U have probably 3 options with sqlserver:

- make it dual (one renderer for current shp, xml, wkb, gdo and one for sqlserver), if else, if else, if else......
- wrap up sqlserver geometry and geography into your geometry model (better but slow...)
- or convert sqlserver geometry to common format like wkb, prior to load into renderer (GC likes this very much, will probably eat RAM at speed 300MB/sec)

;-(

Direct read of blob from db into renderer and work with this blob is simple and straight forward way.

Another reason might be that some organisations might have allready data in sqlserver in WKB or GDO formats and put them into native using STFromBinary() methods might cause problems. Especially GDO users suffers from lack of conversion functionality for their
blobs directly inside db.

Last edited Nov 27, 2011 at 7:53 AM by asida, version 10