donderdag 1 november 2012

Creating a relational database - generate scripts

Semanta Modeler offers a powerful script generation engine, which gives you the possibility to translate your semantic database model into a relational database. This script generation engine is a two step process: first it translates your semantic model into a relational model, second it generates a SQL script for the relation model to create the database definition in the target database of choice (at this moment Oracle, Microsoft SQL, PostgreSQL, MySQL and Interbase/Firebird are supported).

Generating a relation model
When translating a semantic model to a relational model, there are three 'problems' which need to be resolved:
  1. Translate the object and attribute names for use in the target database
  2. Translate semantic relations into foreign key columns and constraints
  3. Translate the semantic concepts for specializations
Ad 1: Translating the object and attribute names
For all kind of reasons the mapping of the semantic object/attribute name will not always be one-to-one with the original name in the semantic model. For this reason Semanta Modeler has some options to convert the names, which can be configured in the Model properties window (SQL settings tab).

The left side of this window shows three options to influence the naming:
Object name prefix: Prefix each created table with a specific prefix.
Add object shortname as attribute prefix: Each created object in the model can/will have a short name (typically a 3 character abbreviation of the object name). This option will use this as a prefix for each generated column.
Always use column prefixes from generalisation: When using specializations the prior option, the specialization attributes will get the prefix from the specialization they belong to. With this option, it is possible to get the generalization prefix on the specialization attributes.

Ad2: Translate semantic relations into foreign key columns and constraints
The concept of a relation in a semantic model should will be converted to the concept of a foreign key in the relational database. For this purpose the relation model generator will create a foreign key column in the table corresponding to each primary key column in the table being referenced. The naming of these foreign key columns can be configured in the right side of the SQL settings tab of the Model properties window (shown above).

The foreign key column mapping can be configured with a number of options, and offers a quite flexible way of naming your relational columns. The settings provided here are at model level, so will be valid for all foreign keys in the model. In addition to these settings, Semanta Modeler has an option to override the settings at a specific foreign key level.

Ad3: Translate the semantic concepts for specializations
The concepts specialization and generalization are typically not supported by a relational database. For this reason Semanta Modeler has a mapping algorithm which maps the specializations in your model to the relational model. The mapping comes down to the following:
  1. The relational database will only have tables for the generalizations
  2. A specialization block will translate to a column in the generalization table of type string, and a check constraint to allow for the possible short names of the specialization.
  3. All attributes for the specialization are added as optional columns to the table of the generalization. To distinguish between these columns, the names will be prefixed with either the short name or full name of the specialization object.
  4. When a record is of a specialization of a specific type, all mandatory columns of that specialization should have a value. A trigger will be generated to check for mandatory columns in the specialization when needed. This trigger will also guard foreign keys to specializations to only allow references to records of the correct type.

For example this model:

Translates into this relational table (for Microsoft SQL server):

create table dbo.Person (
  PEN_Id         int identity(1, 1) not null
 ,PEN_Name       varchar(40) not null
 ,PEN_Employment varchar(3) not null
 ,constraint sc_PEN_Employment check (PEN_Employment in ('EME', 'UND'))
 ,EME_Salary     float null
 ,constraint pk_Person primary key (PEN_Id)

Or this (using different SQL settings):

create table dbo.Person (
  Id              int identity(1, 1) not null
 ,Name            varchar(40) not null
 ,Employment      varchar(3) not null
 ,constraint sc_PEN_Employment check (Employment in ('EME', 'UND'))
 ,Employee_Salary float null
 ,constraint pk_Person primary key (Id)

Using this kind of specialization mapping (into the generalization), the specialization concept is especially useful for the modelling of a type or kind field into a record, which toggles a mandatory check on a set of columns.

Data type mapping
When generating a relation database from the definition in Semanta Modeler, we quickly run against the different implementations for the different target databases. One of these differences is the mapping of the data types. The table below shows how the different data types known to Semanta Modeler are mapped against the target database:

Data type Oracle Microsoft SQL PostgreSQL MySQL Interbase/Firebird
Blob BLOB varbinary(max) bytea blob blob
Boolean CHAR(1) bit boolean boolean boolean
Char CHAR(s) char(s) char(s) char(s) char(s)
Datetime DATE datetime timestamp without timezone datetime timestamp
Float - single BINARY_FLOAT float real float float
Float - double BINARY_DOUBLE float double precision double double
Integer INTEGER int integer integer integer
Long Integer INTEGER bigint integer bitint bigint
Memo CLOB varchar(max) text text blob sub_type text
Number NUMBER(s, p) numeric(s, p) numeric(s, p) numeric(s, p) numeric(s, p)
Spatial SDO_GEOMETRY geometry geometry geometry N/A
String VARCHAR2(s) varchar(s) varchar(s) varchar(s) varchar(s)
Time INTERVAL DAY TO SECOND time time time N/A
Timestamp TIMESTAMP(p) datetime2(p) timestamp(p) with timezone timestamp timestamp
Unicode Char NCHAR(s) nchar(s) char(s) char(s) char(s) character set unicode_fss
Unicode Memo NCLOB nvarchar(max) text text blob sub_type text
Unicode String NVARCHAR2(s) nvarchar(s) varchar(s) varchar(s) varchar(s) character set unicode_fss
UUID RAW(16) uniqueidentifier uuid uuid char(38)
XML XMLTYPE xml xml xml blob sub_type text

The script generator uses the type mappings to generate the scripts. In addition to plain attributes, the semantic model can contain additional attribute types (which have a special meaning). These attributes are the specialization block and the relation. These two attributes require the special handling as described earlier (when translating to the relational model) as these concepts are unknown or different in the world of a relational database.

Generated output scripts
Before generating the database scripts, Semanta Modeler will check the consistency of the model. When errors are found, the export cannot proceed. You can trigger the consistency check manually by using the Check model option from the Model toolbar. This check can generate errors, warnings and hints in order to guide you in creating a correct database model.

Once the model is found to be correct, we can start the generation of scripts. When generating scripts for a full database (creating a new database from scratch), Semanta Modeler will generate a number of scripts for you (which scripts are available also depend on the choosen target database, as not each database supports all concepts):

These scripts contain the following:
  • Create table script: This script contains the DDL statements for the creation of all tables in the model (as described above).
  • Create index script:This script contains the DDL statements for the creation of additional indexes which have been defined in the model. In addition it will automatically include an index on every foreign key and specialization block in the model (when not defined in the model).
  • Create view script: This script contains the DDL statements for the creation of database views for each of the specializations in the model. Each view is defined as a selection on the generalization table, with a filter on the specialization block field. It will also only contain the columns applicable to the specialization.
  • Create triggers script:This script contains the code to guard the specialization's mandatory fields and the relations to specializations (as described above).
  • Create sequences script: This script contains the DDL statements for the creation of database sequence objects. When attributes in the domain have been defined to be auto-incrementing, some database implementations need a sequence object to generate the numbers for these fields. Semanta Modeler will create a sequence for each auto-incrementing field in the model.
  • Create comments script:This script contains the code to add the object and attribute comments from the model on the target database DDL. After running this script, a database DBA or developer can see the model comments in tools like Oracle SQL developer or Microsoft SQL Server Management Studio.
  • Create sample data script: This script contains INSERT statements for demo data which has been entered as part of the model.

Full script export versus version migration
When generating a database script, Semanta Modeler offers the option to generate scripts for creating a new database, or to generate a migration script between two versions of the model (so it will migrate the existing database to the new version of the model). Using the latter scripts, it is possible to apply changes to the model to an existing database.

The assumption for the migration script will always be that the current state of the database is 100% equal to the scripts generated by Semanta Modeler for the old version. The aim of the migration scripts is to maintain the existing data as much as possible. As these kind of migrations can become quite complex, not all situations can be handled, but the script attempts to give a very useful starting point.

The basis for the version migration is formed by the version compare tool (which is also available as a report in the semantic modeler). The migration script generator will combine the version differences (as found by the version compare tool) with the generated relation model, and generate the necessary DDL statements to update the target database.

A last useful tip: breaking a complex migration into several smaller steps (using multiple intermediate versions), will increase the ability of Semanta Modeler to generate a correct migration automatically for you.

maandag 15 oktober 2012

Improved Readability - The use of submodels in Semantic data modeling

Within Semantic modeling we have so called Scenarios. They are there simply because of readability. It is proven in practice that people are better able to read and understand a semantic model firstly due to its hierarchical notation and secondly because of the use of Scenarios. Scenarios typically zoom into logical submodels and show the links to the relevant objects in other submodels.

Let's have a look at the Microsoft example database AdventureWorks.

AdventureWorks is a bicycle manufacturing company . It's employees produce these bicycles themselves but since 2000 they also purchase sub components. They sell to customers all over the world.

The overall Semantic model looks like this:

Logically there are 4 main submodels: HR, Production, Purchasing and Sales. Let's zoom into each of these concepts.



A central concept Person is used to support the concept of Employees (used in Purchasing, Production and Sales) and Customer Persons. Per Person phone nrs and creditcard information is stored. Employees are modelled in a hierarchical way keeping track of each employees manager. Furthermore the Departments of the company and their employees are stored. History is kept of Payments and Job Candidates.



The central data object is Product along with its Bill of Materials using different Units of Measure. Products are (sub)categorized and there are different models with illustrations and descriptions per culture. Products are produced according to WorkOrders which are routed accros Locations. The system keeps track of the Inventory. Photos and Documents are stored for each product. A history is logged of Product Costprices, Reviews, Transactions and Listprices. Product prices are listed as so called Special Offers with a certain validity period.




Vendors supply product components which link into the bill of materials of the end product. The system contains all the purchase order details. The employees and products involved in the Purchasing process are also shown in this picture.



The central data object here is Customer and all of its Sales order details (quantities, prices and reasons). Customers are found in different territories grouped in Countries/Regions. The system is able to handle multiple currencies, exchange rates and tax rates. The employees and products involved in the Sales proces are also shown in this picture.

Entity Relationship modeling

Just to compare both methods a picture of the ER model of AdventureWorks.

donderdag 4 oktober 2012

Useful Design Patterns - lists & multiple self referring trees

Useful Design Patterns

The cost and time to design new data models can be brought down when combining a number of typical solutions into a new one. Reusing certain design patterns is advisable because it is very efficient to have solution features which quickly lead to a reliable result.

In this message we discuss :

-(Multiple) List(s)
-(Multiple) Self Referring Trees
-A bit about Time driven modelling.


Let's assume we need a concept country for our solution. The most simple format we can think of is a data object called Countries. It enables us to list all countries of the world and select from them.

Multiple Lists

Now suppose we like to limit the country choice list per continent for a bit more user convenience. A typical solution would be to add a new data object Continent.
After selecting a continent we can subselect the countries and only show these to the user. It also enable users to filter data by contintent for reporting purposes.

Self referring tree + List

We all know reality is more complex and therefore let's assume we need a concept called Region to store data on. This can be organised in a simple self referencing tree whereby we can model the regiontypes like continent, country, state, department,etc in a simple list.

Here we see that nested regions like [Earth,America,USA,Texas,Houston] can be simply stored in one data object which refers to itself.

Multiple self referring trees and link lists

When we only require geographical system this would be sufficient. However when we want regions to be member of multiple Regional groupings we need a bit more complex construction.

The data object NestedRegionTypes contains different regional grouping systems like:

Dutch system

  • Country
  • Province
  • Community

French System

  • State
  • Region
  • Departement
  • Arrondisement
  • Kanton
  • Commune

Intercontinental Country Organisations

  • APEC
  • G7
  • etc

European Country Organisations

  • EU
  • Benelux
  • etc

American Country Organisations

  • OAS
  • etc

Having these systems in place each Region has a mandatory geographical type so we can always find it by using the geographical ordering discussed above :
  • -USA (country) of North America (continent)
  • -France (country) of Europa (continent)
  • -Netherlands (country) of Europa (continent)
  • -Texas (state) of USA (country)
  • -Haute Provence (Region) of France (country)
  • -Utrecht (province) of Neth)erlands (country
  • -etc

Now on top of that each region (for example a country like USA) can be member of multiple other regional grouping systems (e.g. APEC, G7, NAFTA) by entering them in the data object RegionLink.
  • APEC,Japan
  • G7,USA
  • G7,France
  • NAFTA,Mexico
  • EU,France
  • EU,Netherlands


One can now add grouping systems of any type to the database and directly use them without any programming effort. For example the collection of ''Kantons speaking a particular language in Switzerland'' can be added in two steps.

First create a grouping system of lingual division in the dataobject NestedRegionTypes.

Language spoken
  • German
  • French
  • Italian
  • Romansh

Secondly link the relevant Swiss Regions to their lingual group in the dataobject RegionLink

  • Geneva, French
  • Graubunden, Romansh
  • Ticino, Italian
  • Schaffhausen, German

More time driven flexibility

Finally, to add a bit more flexibility, when membership to grouping systems vary in time you can give them a start- and enddate attribute to filter out the non-current regions at any give moment.

Let's take as an example the Commonwealth of Nations established in 1931, today a group of 54 countries. The entries in the RegionLink data object look like:
  • Commonwealth, Australia, 1931, today
  • Commonwealth, SriLanka, 1948, today
  • Commonwealth, Tanganyika, 1961, 1964
  • Commonwealth, Ireland, 1931, 1949
As you can see in 1940 SriLanka was not a member whereas today Ireland is no longer a member, so the list varies in time.

That's it for now, hope you found any of the solutions useful for your application.

maandag 10 september 2012

Applied Datamodeling BV launches SEMMOD

Without doubt Data Modelling is at the very heart of all IT systems regardless of their technology, development paradigm, applicationtype or market segment. Since more and more systems get interconnected it becomes more and more important to keep track of all relevant data types in the IT ecosystem.

When more objects and more relationships between objects exist in one environment it gets more complex to keep a proper overview. Which systems are influenced when someone wants to extend an object with a number of new attributes?

All these matters can be addressed when using the data modeling tool Semanta Modeler launched September 2012 by Applied Data Modeling BV.

What kind of data modeling tool is Semanta Modeler?

The basic concept is to only support a strict object hierarchical notation. This means that objects that refer to other objects are in a higher hierarchy than objects that are being referred. Because of this hierarchical notation only a few symbols are needed to depict the objects in a data model with their respective relations which make the model intuitive and understandable for non IT specialists.

To make the concept stronger the modeler supports important notations for inheritance, grouping of inheritors (adjunct and disjunct specializations) and multiple-inheritance.

These features enable the user to create data models which are describing the full semantic conceptual scope of the IT systems of any organization.

For productivity reasons a number of database scripting export options are supported to enable the user to put their models to work (Oracle, SQLServer, MySQL, etc).

For calibration and conversion reasons reverse engineering of database models is also supported.