Thursday, May 19, 2022

How to Build a Business Glossary Linked with a Data Dictionary [Webinar]

Hello everyone my name is Gert Kononov and I'm a founder and product manager of de todo welcome to my webinar on building business glossary is linked with data dictionaries today I am gonna start with a legal theory and I will talk about what the business glossary and data dictionary actually is oh and I will say a few words about the benefits .

Why you should care about those assets and second part would be all about the presentation of data idea on how you can build a business glossary using data and build a data dictionary map the tool and share it with your within your organization and we'll close with Q&A session you can use a question tap on your right hand side mm to ask questions .

Right now and I will review them after my presentation okay let me let me get started so let's let's talk about business goes first and data dictionary so what they actually are so one of the things to understand and and know about it is that those are two very separate things let me start with business glossary business glossary is a list of .

Business terms used within an industry or organization with the exact unique documentation a definition why why having a list of terms what the definitions would be important in the first place isn't that obvious what what the customer is shouldn't everybody know what it is I mean quite quite basic knowledge well it turns out it's not so .

Let's have this as an example if you would ask few people different people from different departments in your organization what is a customer you would get very very different results so let's start with the CEO for CEO well off key all he cares about is that customers are people or organization who buy products from .

That Marketing would say probably that it's somebody from visits their their website says they carry all about the leads and prospects so for them they're all those all those contacts in their CRM for call center are the people that are called call them on the support line accounting sees that even differently they for them .

Those are those items on on the inverses so you know one sees it as people the other one sees it as as companies and business entities and for IT is all about the roles in the table and so although the definition is I just provided they they are very very different and if you will ask then how many customers I had last year you would .

Get very very different results that's why it's very important in any analytics and data initiative to actually know what you would like to be measuring know what you talk about and have a very common language of the organization so in this case we could identify those issues and provide a unique names with the very exact definitions of those of .

Those entities and this is where this is the role of business glossary so having those unique names and exact definitions this is the exact reason for business glossary so come on an understanding of business terms throughout entire organization okay let me get to the tektite dictionary so doctor dictionary is it says specification of data so .

Basically if we're talking about physical data dictionary then those are the lists of the tables in your databases and list of attributes you can see a list of tables in one of those one of our databases and list of columns and they attribute some descriptions and so this is a data dictionary this is a specification of a specific data .

Set okay why why you need such a thing so identified three reasons one is that you actually know what data you have otherwise they are just trapped in the databases and yeah nobody can use it nobody can even figure out that they would like to measure something analyzed some some data that they can actually use it to get some insights a second .

Purpose is to find the data so even though you know you have a say sales order orders it's not easy to find where exactly so which tables which columns they are kept in I'll show you two examples in in a second mmm and yeah the third one is to to understand even if you have found that you have a amount column for instance on the order and so .

You need to really understand what is it this amount what currency this is is it net amount is it a including the tax and and so on so even if you look at the data you don't always know what you are looking at so let me show you two examples of the complexity I think you should be aware of so this this is a list of tables in Oracle application .

Core called Oracle e-business suite do not tune your monitor you don't see it because there's like a 20,000 of those tables so this is the least this is assumed and all those items are there are single tables or heels so imagine looking for something in Oracle application another example comes from another huge application which is called .

SAP and SAP is a very old application and back then they needed to save every bite and that's why this they decided it's a good idea to have four and five letter names for all the tables and columns so this is actual table from SAP and you know good luck figuring out fun finding the data and doing a report from that so this is where the dictionary .

Comes in place you list it you provide meaningful aliases you provide descriptions you explain how data is related across tables so the inter dictionary allows you to work with your data okay um here are some differences between the two so they have very different focus on data dictionaries all about the data assets and business gosh .

This about business concepts business glossary is a list of terms were data dictionary list of tables and columns or sometimes entities and attributes you can call it different names but essentially is the same thing the goal for business glossary is to have a common vocabulary and understanding of world terms and data dictionaries to .

Understand the assets data dictionary is created mostly by IT because they are the ones working with data and business glossary is is or should be owned by by the business and another very important difference is that there is only one business glossary well it could be split into different sections you'll see in our tool in a second but generally one .

Word has one definition throughout entire organization one term and data dictionaries it's not it's not the same thing and you can have the same let's say customer so you can have a customer multiple times in one database and imagine multiplied by the number of databases and application applications through throughout the entire .

Organization so like that this diagram visitor visualizes this concept so we have one one customer one definition of customer but you can find it in very in multiple databases in multi both tables so this is where mapping one to the other comes into place so you can having a tool and do it in Excel I'll show you a better way right now .

You can map one to the other and you can do it on two different levels so you can map a specific term to the table and you can link is also to the column so here's an example of mapping of the term to the tables you can see that let's say an employee we can find it in the employee table in data warehouse but in two different tables in procurement system .

As well as HR system also health is start in two different tables and so there's many many different occurrences of the of the same item and it's even more the case so with what comes so let's say you have a very generic thing like email address I don't think you need a little finishing for this but this is just an example and you can find .

This specific attribute in many many many table columns in many tables in all your databases so it is not an easy task to actually find all the occurrences in your data dictionary and okay why you should care about mapping one to the other so first benefit is to find what are your key key data elements in the databases so let's say customer you can .

Easily find it located without having to go through all the 20,000 tables or so it works the other way around as well so you can you know business glossary provides additional layer of metadata on top of the technical metadata so if you look at those you know tables and columns they can be very confusing even if you .

Description for the column or table it might not exactly be clear but if you link a business class or charm then it provides an additional approved understanding of what's actually held in a table or column okay so that that's it for the theory and let's go to the demo but this is a big data application what you're seeing right now is a desktop .

Version of it and I'm logged in to my repository this is something new you see when you okay get a child so it's four brand new blank repository and I have some some examples here so this is this item II say it's a business glossary and these elements are data dictionaries let's let's focus on the first thing and I will show you how you can define your .

Recovery okay maybe one word I'm already logged in when you run the program you will see this window there are two versions of the repository so so this is where all the metadata is stored there is a file I can click the create it on on your disk and there is a server you require a SQL Server database for that so this is a advice because it loves for .

Collaboration with with multiple people okay when I have already created that and I am ready to create my first glossary so I click Add and I create I click business glossary so it's called neo business glossary when I click here you can see that I haven't moved business terms yet so I click right click and choose term there .

Are a few other types of the the entries in the business area but let me focus on the business term only right now so let me create a customer and let's let's provide there a definition so let's keep it simple is you know vice entity advice business okay that's let's get into details I can also provide few attributes for that so .

Let's choose it the owner that's chose me so that would be useful for the management and let's indicate the status of that so there are a few statuses you can create your your own so let's say that this is active you know you could change it to approved or something like that so then later when you review the list you can see who to talk to and is .

It actually a valid business term okay I you can see here the two two attributes you couldn't you could add many more attributes but there are some suggestions those attributes are also for the tech detection area you could add your own metadata fields okay so this is my first glossary and the first term I can now share it in HTML so I .

Click export I click this option with business glossary click next next let's just just just my glossary okay and few times it just there is some configuration and it should open so you can see I have my list of terms so one term and I can see the term and its definition I can search there just one so let me .

Show you how it looks when you have a real glossary and the dictionary can see you have many databases I have a few glossary so if I search for customer okay yeah that's what it is that's the data dictionary action okay um let's get back to the program and defining my grocery so let's let's do a hierarchy of .

The terms so let's create a term called leaving customer and let's create returning and now you can with a simple drag-and-drop I can create a hierarchy of this of those terms which makes it easier here you can see it more clearly if you have a lot of that you can see them on the list but you can also make a hierarchy which which makes allows for .

Some organization of those of those terms like like folders you can also create a category if so I create I put leaving in a returning customer under the customer but I could also create a category so let's say customers and I could put all of this under under that category okay so that's that's a simple hierarchy and data allows you to define .

A complex network of relationships so let's let's create a synonym for a customer let's call it kind and right now if I go to the related term stop that it's it's empty but I can drop drop I'm sorry so here I can drop mmm the other term and I can indicate what is the type of this relationship so it's .

A Sai synonym of and I'm gonna save you can see that when I go to the client they are both visible on the both ends of in the both terms there are many different types of those relationships so let's let's go to the customer and this time let's hit this button and right now I can you can search for specific terms and I can not choose .

Multiple of that of the terms and so let's see create a relationship type of easy exit type of actually in this direction has a type and this leaving customer is a type of a customer and for the other one as well so you can see that I have for the customer have three related terms and when I go to the one of those so this this was a symmetrical .

Relationships and let's see the other ones it says it's a type of when I share it let's see what happens okay when I go to the customer I can see that it as a synonym of client and has a type of living customer and returning customer and on the other end as well okay so that's that's the relationship I can build multiple relationships entire .

Network you can see those types defined here and you could even extend this this list with your own own types of relationships okay so it's as simple as that to have a business glossary just repeat this process actually the difficult part is not the it's not working with the tool the difficult part with building a business clusters .

Actually identifying the terms and then agreeing on the definition so this is like 99% of work and this is just 1% of the work but we want to make this this part easy the easiest as possible okay right now so let me move to the data dictionary part um so let's leave the business closure for the moment and here I have two connections to the two data .

Dictionaries they represent a specific database specific connection to a physical database so you can see list of tables here right so these are tables imported from from data source you can see here that every table has a specification of columns so that there are technical attributes imported from the database and there are descriptions .

Provided by also imported but but they cannot later be provided by people working with the metadata in the kayid Oh as well you can you you can provide different attributes of your choice so the same castle fields as I showed before okay let's let's add a source so let's pick Adam choose database .

Connection this time I can choose from the list of connectors and if your database is not on the list you can use ODBC for generic connection like me I think that's let me provide the details of the connection CRM let's see per sample Sheeran so it here the tables click Next .

Let's call it CRM ok and I should get another item which is here with the list of tables that have been imported from from the database so you can see there are no descriptions but if there were I mean they will be here eating mine has also a relationship between the tables unique keys and so on ok so I can so having this the data .

Dictionary and business glossary I can I can link the two so let's let's say that company's table holds a customer so what I can do I can drag and drop and to the table to the customer term and it asked me if I want to create a link I say yes I want to create a link and what we should see on the linked arm stop there is a link over here so if I go I'm if I .

Go to the customer term then link data elements I can see this table ok it works also D in the opposite direction I could also drag and drop to the table but let's use this button which helps me scan the entire data catalog so I know that the customer is also kept in day mmm let's choose only tables in the contact .

Table contacts table right so I click and I have created another relationship but that's not all I told you you can link fields as well as specific columns so let's find for contact columns and you can see that there are many many columns that's what I said in the beginning that there are many many different comps so all this let's say it .

All represents the customer data and when I save I see the entire list over here when I export you I will see all this information in a much more convenient way so let's let's see how it looks oh sorry a customer so you can see I have my relationships and I have links to specific data elements when I click to that I transfer to the table page and .

You can see that this table is linked to a customer and I have a lookup of the the definition of this term and also I should yeah we get back and click some some other table I also see by each column the related term the same same thing as on the table level so that that is quite easy way to to build those two entities and link them together I think .

It's very useful for for everybody working with the data so they can easily look up the food things I told you so see about on top of the descriptions here is no description but I can look up what but that actually is and from this and I can also easily find there will be a database field as well so you'll see in which exact database .

Disease so it would say CRM over here so that would be quite quite useful okay um so it's as easy as that as simple as that I'll show one more thing that you can define your the business glossary from the data dictionary directly so let's let's check complaints and I think yeah when I try to link term and there is no .

Complaint okay so I can now create it I click new link both business glossary term I choose the glossary yeah and my term was created that's a singular name yeah of course I need to provide the definition and all the attributes but you can see on the list for instance if I have a status column I see what I need to be working but still so what needs to .

Be defined we can repeat the same thing with the columns so let's complaints again let's see I have a resolved date so I think that requires a specific definition because you know when is a complaint regarded as resolved is it when the operator indicated or the customer approved so I can repeat the same process and create new resolve date .

Let's call it a complaint resolved date okay so you can repeat that process as long as you have data elements you you think I require more definitions from from the business and you can see those those links over here so you can see which columns can drag and drop it here so you could actually which one you think still require .

Additional definition okay um so go back to my presentation so yeah these are just things I just showed you okay if yours about the future capabilities what you just saw is they is a desktop application with an expert you saw me each time mmm I'm clicking export you can optimize that but in a couple of months we'll have a live .

Portal where actually we'll be able to see everything online later we'll add the functionality for commenting and rating assets and terms so you can build the community around the ear data and glossary and we also add advanced search like I showed you a search but that would be more this one works like a filter but they're really much more .

Advanced searching capabilities okay so that was that was it for the presentation now we can go to the your questions I'm along here today so let me let me read through your questions and see if I can Kevin asks so if there are field 9 changes everything updated is everything updated dynamically yes so if I change the well if something changes .

In the source database actually this is this is something I would like to talk about on my webinar in three weeks I will show you how you can build and maintain the data dictionary right now I just just showed you the basics how can import that but it works that lets say this con changed so if it changes some attributes like data type or .

Relationships or whatever then yeah the links remain nothing changes but if it changes the name then this one is marked as deleted and the new one is created underneath so in a way if you change if you go in and edit the name of the of the term then yeah all the links reminder nothing changes okay let me know okay something Mohamed asked .

How do you think about the differences between data grocery and data profiling well they are different different things so data profiling is all about looking into into the data into the physical data so it's more linked to the big data data dictionary if you men take the dictionary than the data dictionaries I start to give you on the data and data .

Profiling is more dynamic so you actually see look into the tables see what the data is there so are there any so what's the distribution of values and so on so profiling is denied dynamic it changes and the dictionary is static okay the chart asked how to manage the the line each with a title so if you mean the line .

Edge data an image like data flow from one system to the other you currently cannot define that you could these custom fields quite basic thing let's say like source table source column you come to these items and you could do it man you're here so yeah you can define where the data came from there it wouldn't be dynamic so the .

Links wouldn't wouldn't be available and it wouldn't be imported from any detail we are planning to do it but later on in a couple of months maybe maybe a year or so right now the lineage you have the lineage from the business to the technical so you can call it also a lineage and this is what I just show you so these are the links between the .

Cluster and the addiction okay Kevin asks does work with different database software so if I had a MariaDB server and then SQL Server data the data dictionaries sync so the repository itself is SQL Server only if that's what you're asking but if you're working with different databases so like you have data warehouse on the .

Oracle or CRM on SQL Server and you know your e-commerce platform on the on mario TBH it works just fine all those databases can be different engines it doesn't matter because they are imported to the same same repository in the same structure so it doesn't matter we will be adding those connections and you can use the generic .

So you can put everything into one place okay Martin asks something in the pipeline related to the roles and responsibilities actually what I showed you here so the owner so this is this is something and regardless a roles and responsibilities you can create your own fields like perhaps we have some data .

Steward what you can create I don't like Q and Q a specialist right so even now I choose the type and other things so you can actually you can handle workloads with with fields and roles with those fields it will be adding on top of that in the future but I have no exact plans at the moment okay Terry asked how is the data to license .

Is it named user licensing sorry concurrent user licensing or a subscription pricing so it's it's a nine user and a subscription so what you are purchasing is a one-year license for a specific user and in this case if you'd like to build a data dictionary and business also an enterprise license per each person for .

Each person if your team floats then let's say somebody leaves any person comes in you can switch those licenses but this is not a floating type of license we're planning to edit in the future but currently you need a license paper user and the port which I told you are so read-only and also this this expert is .

Licensed at much smaller cost per each person accessing accessing the documentation again again the same the same idea and so Enterprise enterprise cost $1,900 per per year and the viewer added at this moment cost something about 90 dollars so it's much much much less expensive it would probably be more expecting the future so if you're .

Planning to bind and if you buy now the prices get locked so it's a good time to do it a couple of months it could be when we have the portal it could be more expensive okay so Terry again asks is the tool provided with all the database connector or you do you have to buy each database connector separately no no all this is included in in order written so .

There's one one release and you have all these there are no there are no paid connectors we will be adding more of these we are planning to add reporting all up databases some maybe 80s but probably not in the next twelve months that's all included okay Michael asked about SAP .

Let me read that you had mentioned SAP earlier in the presentation as the database to connect we have a similar database with only abbreviations as field names if descriptions are entered for each of the fields will it be possible to search those user entered through scriptures yes yes yes so um if let's put some .

Description here let me put my name over here and see how it works if I use search function yep so that's that's the whole point oh there are actually many instances of my name that's weird in trigger okay yeah because I created some some triggers yeah but you get the point .

Okay Kevin asks what about expert kappa kappa 3ds if i want to move to a new offering instead of the title all the data is stored in my open repository you can read this data it's not advice to actually modify the data if you don't know what you're doing but it's a spine database it's documented you can read you can integrate it with an anything .

You like you can move it to wherever you like and actually we also have description field you can export for some databases can export it back to the database so but it's just for some selected engines but basically i still have a full access to to the metadata yeah that's that being shot asks does it work with sap .

And DG and hana so you can connect to hana using the ODBC driver we have tested that i'm not sure what MDG stands for so i don't know if it's the database if we if it has a low bc connector then yes now we are planning to add the native Hana connector in couple of months okay Kevin asks again about the import if there's another data .

Dictionary under software I wanted to import into the data of yes we have a couple of options to import to date either one is I think axles Fred and the other one is we could also help you with if this is more advanced you can you can also do it yourself because we have a documentation how to import into the those those .

Fields into columns tables and so on so yes it is possible the repository is open you can upload your metadata we will add some interfacing tables in the future so you will be able to upload your meta data from any source to interfacing tables then hit import button and they will be imported to to our repository in a safe way okay let me .

See I'm reading through all the questions because there's quite many of them George asked I judge by the way is there a limitation on how many tables you can import in the data dictionary so we have tested it with Oracle I just showed you so that was the huge database with with with the screenshot and it worked with twenty with forty thousand .

Tables and views it was quite slowly took quite quite a while and we are improving on the performance but forty thousand tables and I think two million columns that worked fine so I think it can handle many production environments mark asks if I have a DB table with single table inheritance storing multiple terms of .

The same category how can I show this with the type field used to identify the term stored in the table okay that means see when I define the relationships between the terms I have a I have a field called Commons yeah I don't think we have it for the yeah it's not available here so I think what you're asking is is there .

Comment that describes more about the relationship I'm not sure if I understand that but we don't have it I think we could add it in the future so when you can you can link one field to multiple terms but you're right you cannot see what you know what the what's the difference between them and those comment function will I think allow you .

For that I hope that's that's what you meant choice asked if you could import the Excel spreadsheet rather than typing this this in all in currently we don't have that but yeah if you contact our support we have created the same thing for data dictionary it's quite simple you could even ask your IT to upload to the term stable there's just single .

Table and so the answer is yes we don't have it at the moment but we can have it easily if you need so yeah that's you yep sorry I think mark clarifies what he means an example one table organizations stores customers aren't supplier ok yeah yeah so let's let's try that so my I have exactly that right so I have I have a company table company's table right .

And links terms so I'm I think a customer right let's let's see if I have something like a supplier yeah I I don't know let's let's add this companies right ok and right save so now you have the link and if I expect it will be you have company stable so you have a customer and you'll have a supplier you wouldn't .

Have details about you know how to find the supplier how to find a customer exactly mmm so how to differentiate them but you can link the two different items into the same data set oh yeah so mark clarifies that the single column will determine the C the type so C would stand for customer and as for a supplier so currently you cannot do it you can .

Well yeah okay not not through the links but you can indicate it here mmm I think so let's say let's say this is a status so you can indicate it in the description that's one like C is a customer as supplier and you can add the same thing also in the description so yeah you have a way you can use existing fields to to indicate that I hope market .

Answers your question current asks asks if something's did not dynamic about the changes so if something changes in the in the source database if would it be applied to the to the data dictionary yes I'm not prepared to show you this at the moment as I said again as I said before you you can join my webinar in three weeks I .

Believe and I will show you how it works basically the tables dot and columns that have been renamed or deleted are indicated as deleted here and new items are created so it's up to you to rewrite or copy those existing you know metadata description fields and so onto the new item and you can delete the old one and all the new items tables columns and so .

On created automatically there's much more to to this there are many things you can do with the data dictionary that's why it has a separate webinar okay um I haven't notice it it's a it's 50 minutes already there are no more questions thank you for really a lot of interesting questions I can see that you .

Are really trying to build data dictionaries and groceries in the organization yeah so I can invite you to my next webinar on the data dictionary there will be a couple of more about documenting databases finding personal data you will be notified by a newsletter so thank you everyone for for joining it was a pleasure I hope it was .

Useful for you and hope to see you soon on another webinar


Most Popular