kkrishna_6
Krishan Kapoor
Introduction to Database management system
A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run operations on the data requested by numerous clients. Typical examples of DBMS use include accounting, human resources and customer support systems. Originally found only in large organizations with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.
DBMS's are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.
DATABASE
Your company lives and dies by its data.
If you can easily enter your data - and quickly retrieve useful reports - your business will thrive.
If entering an order is a slow, multi-step operation - and massaging your data into meaningful reports takes hours and hours - your business will suffer.
WHAT IS A DATABASE?
A database is a complex tool used to store and retrieve information - quickly, efficiently - and safely. A database is an organized collection of data.
One possible definition is that a database is a collection of records stored in a computer in a systematic way, such that a computer program can consult it to answer questions. For better retrieval and sorting, each record is usually organized as a set of data elements (facts). The items retrieved in answer to queries become information that can be used to make decisions. The computer program used to manage and query a database is known as a database management system (DBMS). The properties and design of database systems are included in the study of information science.
Strictly speaking, the term database refers to the collection of related records, and the software should be referred to as the database management system or DBMS. When the context is unambiguous, however, many database administrators and programmers use the term database to cover both meanings.
Many professionals would consider a collection of data to constitute a database only if it has certain properties: for example, if the data is managed to ensure its integrity and quality, if it allows shared access by a community of users, if it has a schema, or if it supports a query language. However, there is no agreed definition of these properties.
Database management systems are usually categorized according to the data model that they support: relational, object-relational, network, and so on. The data model will tend to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
It should have these features:
Group data in a meaningful format: i.e. show the total unit sales of individual widgets broken down by month.
Enforce referential integrity. This is a fancy way of saying that if you have invoice items - they can only exist in the database if that invoice also exists.
Locate records quickly using an index. This allows you to find in an instant all employees that have worked at the company for over 5 years that live in a zip code of 75001.
CONSUMER VS PROFESSIONAL
Consumer databases such as Microsoft Access or dBase (used by many contact managers) cannot compete with the professional databases such as Microsoft SQL Server.
Client Server
When a consumer databases runs a report, all of the data must be transferred over the network to the user's computer. It is then sorted and processed.
Even though the report may contain only a page of data, many millions of bytes of data had to be transferred and processed locally to generate the report.
With a professional database, the data is processed at the server. When the client sends a request for a report to the server, the server processes the request and only returns the single page of data - instead of millions of bytes.
Multi-threaded
When a consumer database is used for a web site, it can only handle one request at a time - a single thread. This means that everyone has to wait until that request has been processed, however long it takes.
A professional database is multi-threaded and allows many simultaneous requests to be processed at the same time. Visitors to the web site will experience no long delays.
SQL SERVER 2000
Having recently surpassed Oracle in speed, and easily surpassing Oracle in price/performance and support, we have chosen Microsoft's SQL Server as our primary database.
SQL Server is available in clustered versions that can handle terabytes of data, and several smaller versions designed for small to mid-sized companies.
SQL Server is also available in a scaled-down version for even the smallest office that is best suited for only 5 simultaneous users and limited to 1GB of data.
SQL Server is the best choice for an Internet application, an IVR application, or a graphical program managing all of your customer information.
Terminology
A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data.
A database application is computer software written to manage the data of a particular application or problem.
THE HISTORICAL PERSPECTIVE
The earliest known use of the term data base was in June 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (Databank, a comparable term, had been used in the Washington Post newspaper as early as 1966.)
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell, later adopted by IBM as the cornerstone of their IMS product.
For the CP/M and PC-DOS/MS-DOS operating systems.
During the 1980s, research activity focused on distributed database systems and database machines, but these developments had little effect on the market. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software engineering repositories,) and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result.
In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of startup companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
ADVANTAGES OF DATA BASE
• Data can be stored
• Standards can be enforced
• Security Restrictions can be applied
• Integrity can be maintained
• Conflicting requirements can be balanced
OBJECTIVES OF DATA BASE
• Reduce redundant data
• Avoid inconveniences
• Locate data in efficient manner
• Make enhancement to the data base easy to implement
• Make the data base easy to maintain
Database models
Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example of this is the relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known.
A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as selection, projection, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.
Flat model
This may not strictly qualify as a data model, as defined above.
The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is, incidentally, a basis of the spreadsheet.
Network model
The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.
The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values.
Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.
Relational model(RDBMS)
The relational model was introduced in an academic paper by E. F. Codd in 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory.
The products that are generally referred to as relational databases (for example, Ingress, Oracle, DB2, and SQL Server) in fact implement a model that is only an approximation to the mathematical model defined by Codd. The data structures in these products are tables, rather than relations: the main differences being that tables can contain duplicate rows, and that the rows (and columns) can be treated as being ordered. The same criticism applies to the SQL language which is the primary interface to these products. There has been considerable controversy, mainly due to Codd himself, as to whether it is correct to describe SQL implementations as "relational": but the fact is that the world does so, and the following description uses the term in its popular sense.
A relational database contains multiple tables, each similar to the one in the "flat" database model. Relationships between tables are not defined explicitly; instead, keys are used to match up rows of data in different tables. A key is a collection of one or more columns in one table whose values match corresponding columns in other tables: for example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Any column can be a key, or multiple columns can be grouped together into a single key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.
A key that can be used to uniquely identify a row in a table is called a unique key. Typically one of the unique keys is the preferred way to refer to row; this is defined as the table's primary key.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number), is sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)
Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded.
a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.
Often, data from multiple tables gets combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the "cross-product"), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques.
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made the idea and implementation of relational databases very popular with businesses.
Tuple
In mathematics, a tuple is a finite sequence of objects, that is, a list of a limited number of objects (an infinite sequence is a family). Tuples are used to describe mathematical objects that consist of certain components. For example, a directed graph is defined as a tuple (V, E) where V is the set of nodes and E is a subset of V × V that denotes the edges.
Names of tuples
The term originated as an abstraction of the sequence: single, double, triple, quadruple, quintuple ... n-tuple. A tuple of length n is usually described as an n-tuple. A 2-tuple can be either an ordered pair or an unordered pair; a 3-tuple is a triple or triplet. The n can be any positive integer; thus one can for example say that a quaternion can be represented as a 4-tuple, and further constructed names are possible, such as octuple, but many mathematicians find it quicker to write "8-tuple", even if still pronouncing "octuple".
Formal definitions
The main properties that distinguish a tuple from, for example, a set are that (1) it can contain an object more than once and (2) the objects appear in a certain order. Note that (1) distinguishes it from a multiset and that (2) distinguishes it from an ordered set. This is often formalized by giving the following rule for the identity of two n-tuples:
(a1, a2, ...,an) = (b1, b2, ..., bn) if a1 = b1, a2 = b2 and so on.
Another way of formalizing tuples is by mapping them to more primitive constructs in set theory such as ordered pairs. For example, an n-tuple (with n > 2) can be defined as an ordered pair of its first entry and an (n−1)-tuple containing the remaining entries:
(a1, a2, ..., an) = (a1, (a2, ..., an))
Using the usual set-theoretic definition of an ordered pair and letting the empty set represent the empty tuple, this results in the following inductive definition:
the 0-tuple (i.e. the empty tuple) is represented by ∅
if x is an n-tuple then {{a}, {a, x}} is an (n + 1)-tuple.
Using this definition, (1,2,2) would be
(1,(2,(2,()))) = (1,(2, {{2}, {2, ∅}} )) = (1, {{2}, {2, {{2}, {2, ∅}}}} ) = {{1}, {1, {{2}, {2, {{2}, {2, ∅}}}}}}
There is an important similarity here with the way Lisp originally used the ordered pair abstraction to inductively create all of its n-tuple and list structures:
a special symbol NIL represents the empty list;
if X is a list and A an arbitrary value then the pair (A, X) represents a list with the head (i.e. first element) A and the tail (i.e. the remainder of the list without the head) X.
Usage in computer science
In computer science, tuple can have two distinct meanings. Typically in functional and some other programming languages, a tuple is a data object that holds several objects, similar to a mathematical tuple. Such an object is also known as a record.
In some languages and especially in database theory, a tuple is usually defined as a finite function that maps field names to a certain value. Its purpose is the same as in mathematics, namely to indicate that a certain entity or object consists of certain components and/or has certain properties, but here these components are identified by a unique field name and not by a position, which often leads to a more user-friendly notation.
A small example of a tuple would be:
( player : "Harry", score : 25 )
which is a function that maps the field name "player" to the string "Harry" and the field name "score" to the number 25. Note that the order of the components is not relevant, so the same tuple can also be written as:
( score : 25, player : "Harry" )
In the relational model such tuples are typically used to represent a single proposition, in this case that there exists a player with the name "Harry" and a score of 25.
In programming languages tuples are used to form data structures. For example, the following could be a structure that represents a node in a doubly linked list:
(value: 16, previous-node : 1174782, next-node : 1174791 )
Dimensional model
The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries. In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It's important that measures can be meaningfully aggregated - for example, the revenue from different locations can be added together.
In an OLAP query, dimensions are chosen and the facts are grouped and added together to create a summary.
The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions. Particularly complicated dimensions might be represented using multiple tables, resulting in a snowflake schema.
A data warehouse can contain multiple star schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling?
Object database models
In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (t typically as objects). At the same time object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they were never implemented well enough to ensure interoperability between products. Nevertheless, object databases have been used successfully in many applications: usually specialized applications such as engineering databases or molecular biology databases rather than mainstream commercial data processing. However, object database ideas were picked up by the relational vendors and influenced extensions made to these products and indeed to the SQL language.
Database Internals
Indexing
All of these kinds of database can take advantage of indexing to increase their speed, and this technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used; B-trees, hashes, and linked lists are all common indexing techniques.
Relational DBMSs have the advantage that indices can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest.
Relational DBMSs utilize many different algorithms to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are Nested Loops Join, Sort-Merge Join and Hash Join.
Transactions and concurrency
In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction model that has desirable data integrity properties. Ideally, the database software should enforce the ACID rules, summarized here:
• Atomicity - Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
• Consistency - Every transaction must preserve the integrity constraints -- the declared consistency rules -- of the database. It cannot place the data in a contradictory state.
• Isolation - Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
• Durability - Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes.
In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance.
Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only fertilizable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.
Replication
Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve Performance or Availability of the whole database system. Common replication concepts include:
• Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
• Quorum: The result of Read and Write requests is calculated by querying a "majority" of replicas.
• Multi-master: Two or more replicas sync each other via a transaction identifier.
Applications of databases
Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multi-user applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common application programming interface (API) to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.
Common Database Brands
(In alphabetical order)
• 4D
• Adabas
• Corel Paradox
• DB2
• FileMaker Pro
• FirebirdSQL
• IMS
• Informix
• Microsoft Access
• Microsoft SQL Server
• MySQL
• OOo Base
• Oracle
• PostgreSQL
• SQLite
• Sybase SQL Server
Entities and Attributes
Items about which we store information are referred to as entities. An entity may be a tangible object, such as an employee, a part, or a place. It may be non tangible, such as an event, a job title, a customer account, a profit center, or an abstract concept. Often in data processing we are concerned with a collection of similar entities, such as employees, and wish to record information about the same properties of each of them. Such a collection of similar entities are referred to as entity set.
Entity Identifier:
It is necessary for the programmer or data administrator to be able to refer to a record or group relating to a given entity, and it is necessary for the computer to be able to identify it and have a means of finding it in the storage unit. For this purpose one of the data items are usually designated an entity identifier. The entity identifier of an account record would be the ACCOUNT-NUMBER. The entity identifier must be unique; no other entity may have the same value of this particular data item.
Primary Key:
The entity identifier is referred to as the key of the record or group. When more than one data item is used in the key they will be shown joined by a + symbol: FLIGHT-NUMBER + DATE. The combination is referred to as a concatenated key. Sometimes three, four or even five items are needed to form the primary key.
The primary key is defined as that data item or combination of data items used to uniquely identify one record or group. The primary key is of great importance because it is used by the computer in locating the record, by means of an index or other addressing technique.
Secondary Keys:
The computer may also use a key which does not identify a unique record or tuple but which identifies all those which have certain property. This is referred to as secondary key. A value of the attribute COLOR may be used as a secondary key, for example, BLUE. Sometimes a file has many secondary keys which are used to search for records with given properties.
Candidate Key:
The keys other than primary and secondary are referred to as candidate key.
Database management systems:
The schemas and sub-schemas are both used by the database management system, the primary function of which is to serve the application programs by executing their data operations.
The main events that occur when an application program reads a record by means of a database management system are as follows:
1. The database management system obtains the schema (or global logical data description) and determines which logical data type or types are needed.
2. The database management system obtains the sub-schema (or program data description) that is used by application program A and looks up the description of the data in a question.
3. The database management system examines the physical database description and determines which physical record or records to read.
4. The database management system issued a command to the computer operating system, instructing it to read the requisite record(s).
5. The database management system transfers the data from the system buffers to the work area of application program.
6. The database management system provides status information to the application program on the outcome of its call, including any error indications.
7. The application program can then operate with the data in its work area.
ADDRESSING AND SEARCHING:
The events are complicated by the fact that the required record usually cannot be located directly from the key that is provided by the application program. The record address must be obtained before the input\output routines of the operating system can read or write the record. The program which reads the index records and finds the address may be a part of the operating system. It may be programmed routine which can be called into use by the database management system.
Whatever the technique for addressing or locating a record, the program for accomplishing it can be one of three categories:
1. It may be a routine called into use by application programs.
2. It may be a routine called into use by the database management systems.
3. It may be a facility of the operating system commonly called an operating system access method.
Bibliography:
Websites:
1. www.wikipedia.com
2. www.webopedia.com
3. www.databasecorner.com
4. www.greenspun.com
5. www.commerce –database.com
6. www.microsoft.com
7. www.sunmicrosystems.com
Search engines:
1. www.google.com
2. www.google.co.in
3. www.askjeeves.com
4. www.amazon.com
5. www.msn.com
6. www.searchengine.com
Books:
1. Computer Database Organization
- James Martin.
A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run operations on the data requested by numerous clients. Typical examples of DBMS use include accounting, human resources and customer support systems. Originally found only in large organizations with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.
DBMS's are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.
DATABASE
Your company lives and dies by its data.
If you can easily enter your data - and quickly retrieve useful reports - your business will thrive.
If entering an order is a slow, multi-step operation - and massaging your data into meaningful reports takes hours and hours - your business will suffer.
WHAT IS A DATABASE?
A database is a complex tool used to store and retrieve information - quickly, efficiently - and safely. A database is an organized collection of data.
One possible definition is that a database is a collection of records stored in a computer in a systematic way, such that a computer program can consult it to answer questions. For better retrieval and sorting, each record is usually organized as a set of data elements (facts). The items retrieved in answer to queries become information that can be used to make decisions. The computer program used to manage and query a database is known as a database management system (DBMS). The properties and design of database systems are included in the study of information science.
Strictly speaking, the term database refers to the collection of related records, and the software should be referred to as the database management system or DBMS. When the context is unambiguous, however, many database administrators and programmers use the term database to cover both meanings.
Many professionals would consider a collection of data to constitute a database only if it has certain properties: for example, if the data is managed to ensure its integrity and quality, if it allows shared access by a community of users, if it has a schema, or if it supports a query language. However, there is no agreed definition of these properties.
Database management systems are usually categorized according to the data model that they support: relational, object-relational, network, and so on. The data model will tend to determine the query languages that are available to access the database. A great deal of the internal engineering of a DBMS, however, is independent of the data model, and is concerned with managing factors such as performance, concurrency, integrity, and recovery from hardware failures. In these areas there are large differences between products.
It should have these features:
Group data in a meaningful format: i.e. show the total unit sales of individual widgets broken down by month.
Enforce referential integrity. This is a fancy way of saying that if you have invoice items - they can only exist in the database if that invoice also exists.
Locate records quickly using an index. This allows you to find in an instant all employees that have worked at the company for over 5 years that live in a zip code of 75001.
CONSUMER VS PROFESSIONAL
Consumer databases such as Microsoft Access or dBase (used by many contact managers) cannot compete with the professional databases such as Microsoft SQL Server.
Client Server
When a consumer databases runs a report, all of the data must be transferred over the network to the user's computer. It is then sorted and processed.
Even though the report may contain only a page of data, many millions of bytes of data had to be transferred and processed locally to generate the report.
With a professional database, the data is processed at the server. When the client sends a request for a report to the server, the server processes the request and only returns the single page of data - instead of millions of bytes.
Multi-threaded
When a consumer database is used for a web site, it can only handle one request at a time - a single thread. This means that everyone has to wait until that request has been processed, however long it takes.
A professional database is multi-threaded and allows many simultaneous requests to be processed at the same time. Visitors to the web site will experience no long delays.
SQL SERVER 2000
Having recently surpassed Oracle in speed, and easily surpassing Oracle in price/performance and support, we have chosen Microsoft's SQL Server as our primary database.
SQL Server is available in clustered versions that can handle terabytes of data, and several smaller versions designed for small to mid-sized companies.
SQL Server is also available in a scaled-down version for even the smallest office that is best suited for only 5 simultaneous users and limited to 1GB of data.
SQL Server is the best choice for an Internet application, an IVR application, or a graphical program managing all of your customer information.
Terminology
A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data.
A database application is computer software written to manage the data of a particular application or problem.
THE HISTORICAL PERSPECTIVE
The earliest known use of the term data base was in June 1963, when the System Development Corporation sponsored a symposium under the title Development and Management of a Computer-centered Data Base. Database as a single word became common in Europe in the early 1970s and by the end of the decade it was being used in major American newspapers. (Databank, a comparable term, had been used in the Washington Post newspaper as early as 1966.)
The first database management systems were developed in the 1960s. A pioneer in the field was Charles Bachman. Bachman's early papers show that his aim was to make more effective use of the new direct access storage devices becoming available: until then, data processing had been based on punched cards and magnetic tape, so that serial processing was the dominant activity. Two key data models arose at this time: CODASYL developed the network model based on Bachman's ideas, and (apparently independently) the hierarchical model was used in a system developed by North American Rockwell, later adopted by IBM as the cornerstone of their IMS product.
For the CP/M and PC-DOS/MS-DOS operating systems.
During the 1980s, research activity focused on distributed database systems and database machines, but these developments had little effect on the market. Another important theoretical idea was the Functional Data Model, but apart from some specialized applications in genetics, molecular biology, and fraud investigation, the world took little notice.
In the 1990s, attention shifted to object-oriented databases. These had some success in fields where it was necessary to handle more complex data than relational systems could easily cope with, such as spatial databases, engineering data (including software engineering repositories,) and multimedia data. Some of these ideas were adopted by the relational vendors, who integrated new features into their products as a result.
In the 2000s, the fashionable area for innovation is the XML database. As with object databases, this has spawned a new collection of startup companies, but at the same time the key ideas are being integrated into the established relational products. XML databases aim to remove the traditional divide between documents and data, allowing all of an organization's information resources to be held in one place, whether they are highly structured or not.
ADVANTAGES OF DATA BASE
• Data can be stored
• Standards can be enforced
• Security Restrictions can be applied
• Integrity can be maintained
• Conflicting requirements can be balanced
OBJECTIVES OF DATA BASE
• Reduce redundant data
• Avoid inconveniences
• Locate data in efficient manner
• Make enhancement to the data base easy to implement
• Make the data base easy to maintain
Database models
Various techniques are used to model data structure. Most database systems are built around one particular data model, although it is increasingly common for products to offer support for more than one model. For any one logical model various physical implementations may be possible, and most products will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance. An example of this is the relational model: all serious implementations of the relational model allow the creation of indexes which provide fast access to rows in a table if the values of certain columns are known.
A data model is not just a way of structuring data: it also defines a set of operations that can be performed on the data. The relational model, for example, defines operations such as selection, projection, and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.
Flat model
This may not strictly qualify as a data model, as defined above.
The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This model is, incidentally, a basis of the spreadsheet.
Network model
The network model (defined by the CODASYL specification) organizes data using two fundamental constructs, called records and sets. Records contain fields (which may be organized hierarchically, as in COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.
The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values.
Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.
Relational model(RDBMS)
The relational model was introduced in an academic paper by E. F. Codd in 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory.
The products that are generally referred to as relational databases (for example, Ingress, Oracle, DB2, and SQL Server) in fact implement a model that is only an approximation to the mathematical model defined by Codd. The data structures in these products are tables, rather than relations: the main differences being that tables can contain duplicate rows, and that the rows (and columns) can be treated as being ordered. The same criticism applies to the SQL language which is the primary interface to these products. There has been considerable controversy, mainly due to Codd himself, as to whether it is correct to describe SQL implementations as "relational": but the fact is that the world does so, and the following description uses the term in its popular sense.
A relational database contains multiple tables, each similar to the one in the "flat" database model. Relationships between tables are not defined explicitly; instead, keys are used to match up rows of data in different tables. A key is a collection of one or more columns in one table whose values match corresponding columns in other tables: for example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Any column can be a key, or multiple columns can be grouped together into a single key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.
A key that can be used to uniquely identify a row in a table is called a unique key. Typically one of the unique keys is the preferred way to refer to row; this is defined as the table's primary key.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number), is sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)
Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded.
a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.
Often, data from multiple tables gets combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the "cross-product"), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques.
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for decades. This has made the idea and implementation of relational databases very popular with businesses.
Tuple
In mathematics, a tuple is a finite sequence of objects, that is, a list of a limited number of objects (an infinite sequence is a family). Tuples are used to describe mathematical objects that consist of certain components. For example, a directed graph is defined as a tuple (V, E) where V is the set of nodes and E is a subset of V × V that denotes the edges.
Names of tuples
The term originated as an abstraction of the sequence: single, double, triple, quadruple, quintuple ... n-tuple. A tuple of length n is usually described as an n-tuple. A 2-tuple can be either an ordered pair or an unordered pair; a 3-tuple is a triple or triplet. The n can be any positive integer; thus one can for example say that a quaternion can be represented as a 4-tuple, and further constructed names are possible, such as octuple, but many mathematicians find it quicker to write "8-tuple", even if still pronouncing "octuple".
Formal definitions
The main properties that distinguish a tuple from, for example, a set are that (1) it can contain an object more than once and (2) the objects appear in a certain order. Note that (1) distinguishes it from a multiset and that (2) distinguishes it from an ordered set. This is often formalized by giving the following rule for the identity of two n-tuples:
(a1, a2, ...,an) = (b1, b2, ..., bn) if a1 = b1, a2 = b2 and so on.
Another way of formalizing tuples is by mapping them to more primitive constructs in set theory such as ordered pairs. For example, an n-tuple (with n > 2) can be defined as an ordered pair of its first entry and an (n−1)-tuple containing the remaining entries:
(a1, a2, ..., an) = (a1, (a2, ..., an))
Using the usual set-theoretic definition of an ordered pair and letting the empty set represent the empty tuple, this results in the following inductive definition:
the 0-tuple (i.e. the empty tuple) is represented by ∅
if x is an n-tuple then {{a}, {a, x}} is an (n + 1)-tuple.
Using this definition, (1,2,2) would be
(1,(2,(2,()))) = (1,(2, {{2}, {2, ∅}} )) = (1, {{2}, {2, {{2}, {2, ∅}}}} ) = {{1}, {1, {{2}, {2, {{2}, {2, ∅}}}}}}
There is an important similarity here with the way Lisp originally used the ordered pair abstraction to inductively create all of its n-tuple and list structures:
a special symbol NIL represents the empty list;
if X is a list and A an arbitrary value then the pair (A, X) represents a list with the head (i.e. first element) A and the tail (i.e. the remainder of the list without the head) X.
Usage in computer science
In computer science, tuple can have two distinct meanings. Typically in functional and some other programming languages, a tuple is a data object that holds several objects, similar to a mathematical tuple. Such an object is also known as a record.
In some languages and especially in database theory, a tuple is usually defined as a finite function that maps field names to a certain value. Its purpose is the same as in mathematics, namely to indicate that a certain entity or object consists of certain components and/or has certain properties, but here these components are identified by a unique field name and not by a position, which often leads to a more user-friendly notation.
A small example of a tuple would be:
( player : "Harry", score : 25 )
which is a function that maps the field name "player" to the string "Harry" and the field name "score" to the number 25. Note that the order of the components is not relevant, so the same tuple can also be written as:
( score : 25, player : "Harry" )
In the relational model such tuples are typically used to represent a single proposition, in this case that there exists a player with the name "Harry" and a score of 25.
In programming languages tuples are used to form data structures. For example, the following could be a structure that represents a node in a doubly linked list:
(value: 16, previous-node : 1174782, next-node : 1174791 )
Dimensional model
The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using OLAP queries. In the dimensional model, a database consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It's important that measures can be meaningfully aggregated - for example, the revenue from different locations can be added together.
In an OLAP query, dimensions are chosen and the facts are grouped and added together to create a summary.
The dimensional model is often implemented on top of the relational model using a star schema, consisting of one table containing the facts and surrounding tables containing the dimensions. Particularly complicated dimensions might be represented using multiple tables, resulting in a snowflake schema.
A data warehouse can contain multiple star schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling?
Object database models
In recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (t typically as objects). At the same time object databases attempt to introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.
Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they were never implemented well enough to ensure interoperability between products. Nevertheless, object databases have been used successfully in many applications: usually specialized applications such as engineering databases or molecular biology databases rather than mainstream commercial data processing. However, object database ideas were picked up by the relational vendors and influenced extensions made to these products and indeed to the SQL language.
Database Internals
Indexing
All of these kinds of database can take advantage of indexing to increase their speed, and this technology has advanced tremendously since its early uses in the 1960s and 1970s. The most common kind of index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used; B-trees, hashes, and linked lists are all common indexing techniques.
Relational DBMSs have the advantage that indices can be created or dropped without changing existing applications making use of it. The database chooses between many different strategies based on which one it estimates will run the fastest.
Relational DBMSs utilize many different algorithms to compute the result of an SQL statement. The RDBMS will produce a plan of how to execute the query, which is generated by analyzing the run times of the different algorithms and selecting the quickest. Some of the key algorithms that deal with joins are Nested Loops Join, Sort-Merge Join and Hash Join.
Transactions and concurrency
In addition to their data model, most practical databases ("transactional databases") attempt to enforce a database transaction model that has desirable data integrity properties. Ideally, the database software should enforce the ACID rules, summarized here:
• Atomicity - Either all the tasks in a transaction must be done, or none of them. The transaction must be completed, or else it must be undone (rolled back).
• Consistency - Every transaction must preserve the integrity constraints -- the declared consistency rules -- of the database. It cannot place the data in a contradictory state.
• Isolation - Two simultaneous transactions cannot interfere with one another. Intermediate results within a transaction are not visible to other transactions.
• Durability - Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) restarts of the DBMS after crashes.
In practice, many DBMS's allow most of these rules to be selectively relaxed for better performance.
Concurrency control is a method used to ensure that transactions are executed in a safe manner and follow the ACID rules. The DBMS must be able to ensure that only fertilizable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.
Replication
Replication of databases is closely related to transactions. If a database can log its individual actions, it is possible to create a duplicate of the data in real time. The duplicate can be used to improve Performance or Availability of the whole database system. Common replication concepts include:
• Master/Slave Replication: All write requests are performed on the master and then replicated to the slaves
• Quorum: The result of Read and Write requests is calculated by querying a "majority" of replicas.
• Multi-master: Two or more replicas sync each other via a transaction identifier.
Applications of databases
Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multi-user applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology. Software database drivers are available for most database platforms so that application software can use a common application programming interface (API) to retrieve the information stored in a database. Two commonly used database APIs are JDBC and ODBC.
Common Database Brands
(In alphabetical order)
• 4D
• Adabas
• Corel Paradox
• DB2
• FileMaker Pro
• FirebirdSQL
• IMS
• Informix
• Microsoft Access
• Microsoft SQL Server
• MySQL
• OOo Base
• Oracle
• PostgreSQL
• SQLite
• Sybase SQL Server
Entities and Attributes
Items about which we store information are referred to as entities. An entity may be a tangible object, such as an employee, a part, or a place. It may be non tangible, such as an event, a job title, a customer account, a profit center, or an abstract concept. Often in data processing we are concerned with a collection of similar entities, such as employees, and wish to record information about the same properties of each of them. Such a collection of similar entities are referred to as entity set.
Entity Identifier:
It is necessary for the programmer or data administrator to be able to refer to a record or group relating to a given entity, and it is necessary for the computer to be able to identify it and have a means of finding it in the storage unit. For this purpose one of the data items are usually designated an entity identifier. The entity identifier of an account record would be the ACCOUNT-NUMBER. The entity identifier must be unique; no other entity may have the same value of this particular data item.
Primary Key:
The entity identifier is referred to as the key of the record or group. When more than one data item is used in the key they will be shown joined by a + symbol: FLIGHT-NUMBER + DATE. The combination is referred to as a concatenated key. Sometimes three, four or even five items are needed to form the primary key.
The primary key is defined as that data item or combination of data items used to uniquely identify one record or group. The primary key is of great importance because it is used by the computer in locating the record, by means of an index or other addressing technique.
Secondary Keys:
The computer may also use a key which does not identify a unique record or tuple but which identifies all those which have certain property. This is referred to as secondary key. A value of the attribute COLOR may be used as a secondary key, for example, BLUE. Sometimes a file has many secondary keys which are used to search for records with given properties.
Candidate Key:
The keys other than primary and secondary are referred to as candidate key.
Database management systems:
The schemas and sub-schemas are both used by the database management system, the primary function of which is to serve the application programs by executing their data operations.
The main events that occur when an application program reads a record by means of a database management system are as follows:
1. The database management system obtains the schema (or global logical data description) and determines which logical data type or types are needed.
2. The database management system obtains the sub-schema (or program data description) that is used by application program A and looks up the description of the data in a question.
3. The database management system examines the physical database description and determines which physical record or records to read.
4. The database management system issued a command to the computer operating system, instructing it to read the requisite record(s).
5. The database management system transfers the data from the system buffers to the work area of application program.
6. The database management system provides status information to the application program on the outcome of its call, including any error indications.
7. The application program can then operate with the data in its work area.
ADDRESSING AND SEARCHING:
The events are complicated by the fact that the required record usually cannot be located directly from the key that is provided by the application program. The record address must be obtained before the input\output routines of the operating system can read or write the record. The program which reads the index records and finds the address may be a part of the operating system. It may be programmed routine which can be called into use by the database management system.
Whatever the technique for addressing or locating a record, the program for accomplishing it can be one of three categories:
1. It may be a routine called into use by application programs.
2. It may be a routine called into use by the database management systems.
3. It may be a facility of the operating system commonly called an operating system access method.
Bibliography:
Websites:
1. www.wikipedia.com
2. www.webopedia.com
3. www.databasecorner.com
4. www.greenspun.com
5. www.commerce –database.com
6. www.microsoft.com
7. www.sunmicrosystems.com
Search engines:
1. www.google.com
2. www.google.co.in
3. www.askjeeves.com
4. www.amazon.com
5. www.msn.com
6. www.searchengine.com
Books:
1. Computer Database Organization
- James Martin.