Case Study on Distributed Object Store Principles of Operation - Hitachi

Description
Distributed object communication realizes communication between distributed objects in the distributed computing environment. The main role is to interconnect objects residing in non-local memory space and allow them to perform remote calls and exchange data.

Hitachi Data Systems
W H I T E P A P E R
Aciduisismodo Dolore Eolore
Dionseq Uatummy Odolorem Vel
Distributed Object Store Principles of
Operation: The Case for Intelligent
Storage

By Robert Primmer
2
Table of Contents
Executive Summary 3
Introduction 4
Term|no|ogy 5
Document Organ|zat|on 5
Distributed Object Store 5
Operat|ona| Defn|t|ons 5
Bas|c Mode| 7
Prob|ems to So|ve 9
DOS Characteristics 12
Oommon Oharacter|st|cs 12
D|fferent|at|on 14
Futures 16
R|se of lnte|||gent Storage 16
Beyond Arch|ve 17
Summary 17
Appendix A 18
Acknow|edgments 18
Author 18
References 18
3
Executive Summary
Th|s paper exam|nes the growth of d|str|buted object stores (DOS} and exam|nes the under|y|ng
mechan|sms that gu|de the|r use and deve|opment. The focus |s on the fundamenta| pr|nc|p|es
of operat|on that defne th|s c|ass of system, how |t has evo|ved and where |t |s head|ng as new
markets expand beyond the use or|g|na||y presented. We conc|ude by specu|at|ng how object
stores as a c|ass must evo|ve to meet the more demand|ng requ|rements of future app||cat|ons.
4
Introduction
Wh||e the concept of object stores has ex|sted |n the academ|c rea|m for over a decade [23|, [28|,
there have been re|at|ve|y few commerc|a| |mp|ementat|ons. ln the frst decade of the 21st century
a handfu| of commerc|a| offer|ngs emerged w|th the enterpr|se sector as the|r |n|t|a| target market.
As we enter the second decade we see use of object stores genera||zed and expanded to su|t the
"c|oud" use case.
To successfu||y move from concept to commerc|a| product requ|res that you so|ve a bus|ness
prob|em. For new products, success |s typ|ca||y ach|eved e|ther by creat|ng a who|e new market or
by tak|ng an ex|st|ng prob|em and try|ng to do |t better (a|so known as opt|m|zat|on}, w|th the |atter
be|ng the approach taken more frequent|y as |t |s usua||y eas|er to |mprove upon ex|st|ng work than
to create someth|ng comp|ete|y new.
For commerc|a| object stores, the |n|t|a| target of opportun|ty was to prov|de a sp|nn|ng d|sk
a|ternat|ve for bus|ness data arch|ved to s|ower med|a (such as tape or opt|ca|}. The pr|mary va|ue
propos|t|on was conceptua||y s|mp|e: the g|oba||zat|on of bus|ness |n the 21st century meant that
the pace of bus|ness was |ncreas|ng at an |ncreas|ng rate [8|; thus, rap|d and ready access to the
data upon wh|ch bus|ness re||es had to |ncrease accord|ng|y.
Whatever the|r other va|ues, tape and opt|ca| were des|gned to effc|ent|y wr|te data sequent|a||y;
thus, these med|a were dest|ned to fa|| the frst test of rap|d random data retr|eva|. They fa||ed the
second test of ready access due to the mechan|ca|, and somet|mes human, e|ement |n the data
retr|eva| process |tse|f. Before the data can be read, the med|um must frst be |ocated and |oaded
through some comb|nat|on of human and/or mechan|ca| process; thus, |t can take m|nutes to
retr|eve a s|ng|e f|e. Wh||e there are ways to |mprove t|me to frst byte (TTFB} through cach|ng and
read pattern opt|m|zat|ons, these med|a s|mp|y weren't des|gned to be effc|ent at random data
access and were therefore never as fast as sp|nn|ng d|sk for random read patterns |n the genera|
case.
As tape and opt|ca| were most frequent|y used for data arch|va|, the frst commerc|a| object stores
were bu||t w|th data arch|ve as the|r |n|t|a| des|gn center. lt |s |mportant to note that that there |s
noth|ng |ntr|ns|c to objects, and by extens|on object stores, that ||m|ts the|r app||cat|on to data
arch|ve. However, commerc|a| object stores came about at a t|me when a press|ng bus|ness
prob|em to be so|ved was to make arch|ved data more usefu| to the bus|ness by substant|a||y
reduc|ng the t|me requ|red to random|y read stored data. Essent|a||y, th|s meant reduc|ng TTFB by
one to two orders of magn|tude.
Th|s was not the on|y bus|ness prob|em proffered as the rat|ona|e for chang|ng the med|um of
arch|ve from tape or opt|ca| to sp|nn|ng d|sk. The second most common rat|ona|e prov|ded was
data durab|||ty. The nature of arch|ve data |s that the storage t|me hor|zon changes from months to
decades. S|nce tapes tend to degrade over t|me [13| and opt|ca| formats frequent|y change [14|,
the second va|ue a sp|nn|ng d|sk so|ut|on brought was the not|on of easy and cont|nua| upgrade to
new and denser hard d|sk dr|ve or so||d state dr|ve (HDD/SSD}. Th|s not on|y so|ves the prob|em of
med|um obso|escence, wh|ch |s |tse|f a substant|a| prob|em |f arch|ve data |s of any rea| va|ue, but
5
s|mu|taneous|y a||ows customers to r|de the attract|ve cost curve estab||shed w|th HDD, wh|ch has
proven except|ona||y benefc|a| to storage consumers
1
.
A|| these factors comb|ned to form fert||e ground for a new c|ass of storage to emerge. And a
market was born.
Terminology
DDB - D|str|buted Database
DOS - D|str|buted Object Store
HDD - Hard D|sk Dr|ve
HDS - H|tach| Data Systems
SSD - So||d State Dr|ve
TTFB - T|me to F|rst Byte
Document Organization
The rema|nder of the document |s organ|zed as fo||ows. The D|str|buted Object Store sect|on ta|ks
about the genera| pr|nc|p|es of operat|on for d|str|buted object stores as a category, exam|n|ng
them from a systems, market and c||ent v|ew and descr|b|ng the prob|ems they seek to so|ve. The
DOS Oharacter|st|cs sect|on presents character|st|cs that are common to present-day DOS and
d|scusses where they d|ffer. The Futures sect|on |ooks at how object stores as a c|ass must evo|ve
to meet the more demand|ng requ|rements of future app||cat|ons. ln conc|us|on, the Summary
sect|on br|efy rev|s|ts the top|cs covered |n th|s paper.
Distributed Object Store
Operational Defnitions
ln computer sc|ence the defn|t|on of terms are frequent|y over|oaded, somet|mes vary|ng
cons|derab|y. We beg|n th|s sect|on by prov|d|ng our operat|ng defn|t|ons of some common terms
used throughout th|s paper.
Object
The term object |s purposefu||y gener|c so |t can be app||ed to many th|ngs. For the purposes of
th|s paper, the term object w||| be used to denote a data construct that has at |east two const|tuent
parts: data and metadata, where data represents the c||ent data and metadata represents an
arb|trary set of |nformat|on that |s |n some way connected to the (c||ent} data. Therefore, an object
m|n|ma||y equa|s the un|on of data p|us metadata.
1
Many |n techno|ogy are fam|||ar w|th Moore's |aw, wh|ch observes a doub||ng |n processor speed rough|y every 18 months.
|ess fam|||ar |s Kryder's |aw, wh|ch observes a 50-m||||on-fo|d |ncrease |n storage capac|ty s|nce the |ntroduct|on of the frst d|sk
dr|ve by lBM |n 1956 [4|. ln fact, over the past four years, HDD area| dens|ty (measured as g|gab|ts per square |nch} has been
doub||ng rough|y every 11 months, whereas processor capac|ty has been doub||ng somewhat |ess than every 18 to 24 months
[5|.
6
Note that from the c||ent's perspect|ve, each d|screte object |s essent|a||y atom|c when v|ewed as a
storage e|ement. However, from the perspect|ve of the object store, a s|ng|e user object can resu|t |n
many fragments, poss|b|y d|spersed throughout one or more c|usters that const|tute the fu|| DOS.
Distributed Object Store
An object store |s a co||ect|on of |oose|y coup|ed objects that may or may not have re|at|on to any
other object res|d|ng w|th|n the same object store. At present a canon|ca| structure does not ex|st
for an object store such as one fnds |n a trad|t|ona| h|erarch|ca| f|e system
2
. However, some facade
represent|ng a structure recogn|zab|e by a human user may be presented, typ|ca||y to a||ow end user
traversa| of the object store.
The terms "object store" and "d|str|buted object store" can essent|a||y be used as synonyms as the
d|st|nct|on becomes one of d|stance. However, there |s no un|versa||y accepted defn|t|on of how
much d|stance must be ma|nta|ned between c|usters to be cons|dered a "d|str|buted" object store.
At |ts s|mp|est, |f objects can be d|spersed across of set of phys|ca||y d|screte hardware e|ements
(such as nodes}, the object store |s d|str|buted. An add|t|ona| qua||fcat|on |s somet|mes app||ed
where the d|stance between the hardware e|ements |s expected to extend beyond the confnes of a
s|ng|e data center, perhaps extend|ng to d|fferent geograph|es.
For the purposes of th|s paper the s|mp|er defn|t|on w||| be used, as |t expands the set of so|ut|ons
that can be cons|dered w|thout requ|r|ng repeated qua||fcat|on of the terms.
Distributed Database
Not surpr|s|ng|y, there are severa| defn|t|ons for what const|tutes a d|str|buted or decentra||zed
database (DDB}. For the purposes of th|s paper a database w||| be cons|dered to be d|str|buted |f |t
fo||ows the same pr|nc|p|es descr|bed ear||er for an object store. Perhaps the best known examp|e
of a DDB |s DNS [17|.
The concept of a DDB |s |mportant to a d|str|buted object store because, at |ts core, a DOS |s
software bu||t on top of a DDB. Someth|ng has to do the heavy ||ft|ng of keep|ng track of b||||ons
of d|screte object fragments and coa|esc|ng these back |nto a cogent, atom|c object usab|e by
app||cat|ons and humans a||ke, and that job fa||s pr|mar||y to the DDB. lt |s the strength of the DDB
that w||| determ|ne the strength of the DOS; therefore, the to|erance ||m|ts of the DDB w||| be the
to|erance ||m|ts of the DOS |tse|f. For examp|e, the upper bound on the number of objects a DOS
can hand|e |s the number of objects the under|y|ng DDB can hand|e.
ln the case of h|erarch|ca| f|e systems, the burden of |ocat|on awareness for each v|s|b|e e|ement
that makes up the object co||ect|on (such as separate data and metadata f|es} |s p|aced on the
c||ent. As there |s no agreed f|e system construct for the not|on of attach|ng arb|trary metadata to a
user f|e (such as by us|ng extended attr|butes [1|}, |t |s |ncumbent upon the c||ent to create mu|t|p|e
f|es to ach|eve th|s end, each of wh|ch must be |nd|v|dua||y accessed by means of a fu||y qua||fed
path name. W|th an object store, typ|ca||y the burden of |ocat|on awareness sh|fts from the c||ent to
the server [20|.
2
An object-based storage dev|ce (OSD} spec|fcat|on ex|sts and has been rat|fed [21| but has not seen w|despread commerc|a|
use.
7
Basic Model
At |ts most bas|c any object store can be v|ewed from the perspect|ve of the c||ent and from the
perspect|ve of the server. Th|s sect|on descr|bes object stores from these perspect|ves, but adds a
th|rd perspect|ve that dea|s w|th the market tens|ons that surround enterpr|se-c|ass object stores.
These market cond|t|ons have s|gn|fcant|y co|ored the present percept|on of object stores as a c|ass
and have a d|rect |mpact on the|r future deve|opment.
System View
From the system perspect|ve the object store |ooks s|m||ar to a f|e store: there are c||ents and
servers: c||ents make data requests and servers serv|ce these requests. ln the case of an object
store the c||ent |s typ|ca||y not a human user, but an app||cat|on spec|fca||y wr|tten to |nteract w|th
the DOS. Some object stores w||| front the core serv|ce by present|ng d|fferent protoco|s to the
c||ent, such as OlFS and NFS. These front-end systems act as protoco| converters, arb|trat|ng
the d|ffer|ng protoco|s used by c||ents w|th the protoco| used by the server. For object stores that
operate |n the c|oud, such as Amazon S3 and N|rvan|x, the protoco| used to commun|cate w|th the
object store |s typ|ca||y HTTP and |s often des|gned to qua||fy as a RESTfu| |nterface [18|.
The use of HTTP as the h|gh-|eve| commun|cat|ons protoco| |nd|cates one of the frst d|fferences
between an object store and a trad|t|ona| storage subsystem, as TOP/lP |s the protoco| of cho|ce,
versus b|ock-based protoco|s such as F|bre Ohanne|. A|| th|s |eaves an object store |ook|ng
susp|c|ous|y s|m||ar to a f|e store. Both are often used for "unstructured data," wh|ch at |ts s|mp|est
|s another way of say|ng "f|e" |nstead of database. Both are typ|ca||y accessed v|a TOP/lP. So what
are some of the d|fferences?
For starters, a f|e store |s most common|y a f|e system of some var|ety exported for use on a |AN.
A f|e system |s just another type of database and hence |nduces structure; structure then |nduces
||m|tat|ons, such as the number of f|es that can be stored |n a part|cu|ar d|rectory or f|e system. lt
a|so enforces a group|ng where none may natura||y ex|st based on the f|e content |tse|f. W|th an
object store there |sn't the same not|on of order and h|erarchy. lnstead the object store |s v|ewed
as a fat namespace |n wh|ch objects of var|ous types are m|xed together |n a manner opaque to
the c||ent. lf structured storage |s the ch|na cab|net where d|shes are neat|y stacked and ordered by
type and c|rcumference, unstructured data |s the junk drawer |n the k|tchen where random |tems are
thrown together w|th no part|cu|ar sense of connectedness a pr|or|.
Even |n cases where an object store presents the facade of a f|e system to the human user, the
objects themse|ves are scattered about the c|uster |n a manner uncontro||ed by the c||ent. So wh||e
object stores u|t|mate|y prov|de a data storage repos|tory, they are better understood as a software
system that co||ects user data and performs some set of act|ons aga|nst that data wh||e ho|d|ng |t
on some form of pers|stent storage, wh|ch today typ|ca||y takes the form of HDD.
Because |t |s u|t|mate|y a software system, the object store can perform an arb|trary set of funct|ons
aga|nst the data, both dur|ng |ngest and post-|ngest, that go beyond the trad|t|ona| storage
funct|ons of creat|ng |oca| and remote rep||ca cop|es, app|y|ng access perm|ss|ons, perform|ng
de-dup||cat|on, etc. For examp|e, the object store may perform transformat|ons on |mage f|es to
present d|fferent qua||ty |mages to d|fferent c|asses of users, or |t may perform soph|st|cated data
c|ass|fcat|on based on a set of cr|ter|a spec|fed |n the object metadata or |n response to externa|
events, such as reach|ng a certa|n t|me boundary or access frequency.
8
Market Tension
The fact that an object store |s rea||y just a software system runn|ng on a c|uster of servers means
that the system can theoret|ca||y perform an unbounded range of funct|ons, and, proper|y des|gned,
do so at spectacu|ar sca|e. lt |s th|s qua||ty that creates a natura| market tens|on when object stores
are |ntroduced.
S|nce the object store |tse|f can be extended to perform many, |f not a||, of the funct|ons performed
by ex|st|ng content management software, there no |onger |s a cr|sp demarcat|on between the
doma|ns of the app||cat|on software and the storage system. Once they've conquered the base
funct|ons requ|s|te for re||ab|e data storage, |t |s a natura| evo|ut|on for object stores to beg|n to
"move up the stack" and perform more and more funct|ons that were once the exc|us|ve doma|n of
the app||cat|on vendor. ||kew|se, app||cat|on vendors have taken note and are act|ve|y seek|ng to
add more of the funct|ons of the object store to the|r own software, as nobody wants to see the|r
own product commod|t|zed.
Th|s tens|on has had the effect of dampen|ng the growth of commerc|a| object stores |n the
enterpr|se market. However, |t can be reasonab|y argued that from the perspect|ve of the customer,
|t |s better to have common funct|ons performed |n a common way |n a s|ng|e p|ace. Hav|ng any
number of app||cat|ons prov|de the same funct|ona||ty, each |n d|st|nct ways, ra|ses lT cost as the
cost to tra|n personne| and manage these systems must |ncrease ||near|y at best.
How th|s dynam|c p|ays out over the next few years w||| be |nterest|ng. The w||dcard |n a|| of
th|s |s c|oud. lf enterpr|se-c|ass object stores g|ve on|y a g||mpse of how and prec|se|y where
data fragments are stored and reassemb|ed, c|oud makes th|s pos|t|ve|y opaque. As c|oud data
storage |s bu||t upon, and an extens|on of, the pr|nc|p|es of an object store as present|y used |n
the enterpr|se market, |t |s ||ke|y that the market w||| see a greater b|urr|ng of the ||ne between
"app||cat|on" and "storage" over t|me.
Client View
ln the wor|d of object stores c||ents are typ|ca||y app||cat|ons |nstead of human users
3
. Even when
there are human users act|ng aga|nst some common |AN f|e protoco| such as OlFS or NFS, the
c||ent of the object store |s typ|ca||y the p|ece of software perform|ng protoco| trans|at|on that s|ts
between the human user and the object store. Thus, to the c||ent the object store |s a software
serv|ce that s|ts on the other end of a TOP/lP connect|on and responds to requests much the same
as any other software serv|ce.
The degree of opac|ty of the object store var|es by |mp|ementat|on type. ln the case of content
addressab|e storage (OAS}, the c||ent |s often returned on|y a comp|ete|y opaque hand|e when
subm|tt|ng an object and |s g|ven no |nformat|on about the storage of the object [24|. ln other
|mp|ementat|ons, such as the H|tach| Oontent P|atform, the c||ent |s presented a fam|||ar f|e system
semant|c as facade to the object store and |s returned a f|e hand|e upon object |ngest.
Of course, there's no |ntr|ns|c re|at|onsh|p between the presentat|on |ayer exposed to the c||ent
and the manner |n wh|ch the data |s u|t|mate|y d|str|buted and stored, but us|ng a fam|||ar facade
3
Techn|ca||y speak|ng, the c||ent |s a|ways an app||cat|on, even |n the case of a user |nd|v|dua||y stor|ng f|es on a NFS or OlFS
share. However, for the purposes of th|s paper some round|ng |s app||ed to focus on how systems are common|y v|ewed.
9
does prov|de a means for both the app||cat|on and the human user to traverse the|r data |n a
common|y understood fash|on. lt a|so a||ows for user-created |og|ca| group|ngs as a means of data
organ|zat|on. lt doesn't matter that under the covers H|tach| Oontent P|atform doesn't group the
data |n s|m||ar fash|on, |nstead choos|ng the most effc|ent means of spread|ng the data across the
c|uster, because the rea| purpose of the presentat|on |ayer |s to he|p the end user better nav|gate the
system
4
.
Every des|gn mode| se|ected resu|ts |n a set of tradeoffs. The DOS Oharacter|st|cs sect|on wa|ks
through some of the more |mportant d|fferences between the var|ous des|gn mode|s of object
stores.
Problems to Solve
To be successfu| an object store needs to so|ve severa| prob|ems; some are bas|c, but some are
qu|te hard. ln th|s sect|on we survey the |ssues common to object stores |n genera|. We beg|n by
br|efy enumerat|ng the bas|c prob|ems common to a|| DOS |mp|ementat|ons and then prov|de an
expanded |ook at two of the harder prob|ems to so|ve: sca|ab|||ty and concurrency.
Basic Problems
The more common bas|c prob|ems to be so|ved are br|efy descr|bed be|ow, w|th a more deta||ed
descr|pt|on |n the DOS Oharacter|st|cs sect|on.
Multiple Entry Points
The system must a||ow for mu|t|p|e |ndependent app||cat|ons s|mu|taneous|y perform|ng operat|ons
such as read and wr|te.
Global Namespace
An object store shou|d present a g|oba| namespace (GNS} to the c||ent.
Time Horizon
The t|me hor|zon for an object store can be decades. Therefore, the system must be des|gned to
per|od|ca||y check the verac|ty of the data stored as a|| med|a degrade over t|me.
Access Protocol
The access protoco| shou|d work equa||y we|| over a WAN (such as the lnternet} as over a |AN.
Therefore, the access protoco| cannot be chatty, as most network f|e system protoco|s tend to be.
Further, |t shou|d support mob||e dev|ces, such as smartphones, as c||ents.
Unstructured Data
The system must be des|gned to opt|m|ze for unstructured data as th|s w||| be the predom|nant type
of data over the next decade [9|.
4
The except|on to the genera| case |s when a system adm|n|strator wants to ass|gn a co||ect|on of objects to a part|cu|ar c|ass of
back-end storage subsystem. ln such a case, the group|ngs presented matter.
10
Hardware Agnostic
Hardware changes frequent|y. Th|s |nc|udes servers, storage subsystems, and even the storage
med|um |tse|f, as seen w|th the |ntroduct|on of so||d state dr|ves (SSD} wh|ch prov|de better random
l/O propert|es but present d|fferent cha||enges for |ong term use (such as wear patterns}. G|ven the
ru|e on t|me hor|zon, |t |s |mperat|ve that an object store be des|gned to be hardware agnost|c. ||ke
the storage med|um |tse|f, a|| support|ng hardware must be fung|b|e.
The Bookkeeping Problem (Scale)
The D|str|buted Object Store sect|on exp|a|ned that at |ts core a DOS |s essent|a||y a DDB w|th
add|t|ona| software |ayered on top to prov|de va|ue-added features. Here, that thes|s |s expanded
w|th the assert|on that the DDB des|gn |s the s|ng|e most |mportant aspect of the who|e DOS
arch|tecture. lf you fa|| at the DDB des|gn, the system w||| qu|ck|y reach max|mum sca|e - not by
the amount of capac|ty that can be phys|ca||y added to the c|uster, but w|th the number of objects
the system can s|mu|taneous|y keep track of and therefore a||ow to be |ngested. The bookkeep|ng
prob|em |s the pr|nc|pa| gate to overa|| system sca|e when dea||ng w|th an object store, |n both the
capac|ty and performance rea|ms.
Sca|e |s s|mp|y a hard prob|em to so|ve |n computer sc|ence. The n|rvana of "|nfn|te sca|ab|||ty" |ooks
great on a market|ng data sheet, but has thus far proven e|us|ve |n actua| |mp|ementat|on. Mak|ng
the prob|em harder st|||, w|th object stores the t|me hor|zon prob|em means that sca||ng out needs to
be essent|a||y seam|ess because fork||ft upgrades are counter to the prom|se of an "act|ve arch|ve."
A true enterpr|se-c|ass DOS needs to sca|e to tens of b||||ons (10
10
} of user objects
5
. W|th c|oud th|s
number has the potent|a| to |ncrease by severa| orders of magn|tude, so |t's easy to see how d|ffcu|t
|t becomes for a |oose|y coup|ed, d|str|buted system to s|mu|taneous|y keep track of c|uster objects
|n the hundreds of tr||||ons (10
14
}. Re|at|ona| databases (RDBMS} don't do we|| w|th tab|e entr|es that
range |nto the tr||||ons, and even |f they d|d, |ocat|ng the mu|t|p|e e|ements that const|tute a s|ng|e
user object can grow |n t|me quadrat|ca||y.
There's no easy way to so|ve th|s prob|em w|th RDBMS techno|ogy. For starters, RDBMS software
|s opt|m|zed for AOlD cons|stency, wh|ch makes |t subopt|ma| for d|str|buted databases [11|. Further,
the cost to operate such a substant|a| RDBMS system wou|d be proh|b|t|ve|y expens|ve, as |t wou|d
requ|re best-of-breed commerc|a| RDBMS software coup|ed w|th very fast (and therefore very
expens|ve} storage systems just to run the RDBMS a|one, eas||y ec||ps|ng the tota| cost of the object
store |tse|f.
W|th the H|tach| Oontent P|atform product |t was determ|ned that the best way to hand|e th|s
prob|em was to break the database up and d|str|bute |t more or |ess equa||y among the nodes of the
c|uster, much the way objects are d|str|buted. Th|s mode| has severa| advantages:
?
The number of objects each node |n the c|uster can store |ncreases; thus, the resu|t|ng number
of objects the c|uster can store grows qu|te |arge.
5
Note that |n severa| ex|st|ng |mp|ementat|ons, a DOS takes a s|ng|e user object and converts |t |nto n c|uster objects, where
at the very |east n>2 for s|mp|e m|rror|ng, but more ||ke|y ranges to 2<n<14 |f more storage effc|ent means are used for data
protect|on, such as erasure encod|ng [27|. At th|s sca|e the tota| c|uster objects used to store 10 b||||on user objects can qu|ck|y
reach 100 b||||on c|uster objects (10
11
}.
11
?
The t|me to |ocate a part|cu|ar object fragment |s qu|cker as the |ookup operat|on |tse|f |s ||kew|se
part|t|oned.
The protect|on mode| for the database |tse|f can fo||ow essent|a||y the same mode| used to protect
user objects w|th|n the c|uster, thus |ead|ng to greater protect|on for the DDB |tse|f, wh|ch |s cruc|a|
to the proper funct|on of the who|e object store, as descr|bed |n the D|str|buted Object Store
sect|on.
As d|scussed ear||er, every des|gn mode| necessar||y br|ngs w|th |t a set of tradeoffs. ln th|s case, the
greater res|||ence and operat|ona| effc|ency of break|ng up the database and d|str|but|ng |t among
the const|tuent nodes of the c|uster |eads to the |ssue of concurrency.
Concurrency
||ke |nfn|te sca|e, concurrency |s another of the we|| known "hard prob|ems" to be so|ved |n
computer sc|ence. The fact that we're ta|k|ng about d|str|buted object stores means that the
concurrency |ssue has to be so|ved |n order to have a v|ab|e so|ut|on at |arge sca|e. Break|ng up
a database to manageab|e parts and d|str|but|ng the p|eces among the nodes of the c|uster may
so|ve the sca|e prob|em, but |t's st||| a fa||ure |f the |nd|v|dua| databases a|| have d|ffer|ng v|ews of the
truth.
W|th very |arge sca|e systems d|spersed over d|stant geograph|es, synchronous concurrency |sn't
pract|ca|. There are a number of mode|s used to prov|de su|tab|e concurrency. Amazon's S3 uses
the "eventua||y cons|stent" mode|, wh|ch a||ows geograph|ca||y d|spersed s|tes to be |ncons|stent,
but on|y for a per|od of t|me cons|dered suffc|ent for the so|ut|on prov|ded [30|. How |ong |t |s
acceptab|e for the same object to be |n d|fferent states |s, of course, a funct|on of the needs of the
c||ent app||cat|on.
Wh||e there are numerous d|mens|ons to the prob|em, we can genera||y state that there's an
|nverse re|at|onsh|p between the econom|cs of the so|ut|on and the t|me |t takes to reach perfect
synchron|c|ty. App||cat|ons that requ|re very fast synchron|zat|on, such as those seen |n the fnanc|a|
sector w|th bank transact|ons and stock market trades, requ|re the customer to pay a prem|um for
systems that reach synchron|c|ty very qu|ck|y. However, for a |arge swath of app||cat|ons such cost
|s unnecessary to adequate|y meet the c||ent app||cat|on needs.
ln the|r |n|t|a| |nstant|at|on, "c|oud" app||cat|ons tend toward th|s |atter set of consumers |n part
because the lnternet |s the med|um of cho|ce for data transport. S|nce the lnternet |s pr|mar||y
constructed as a packet-sw|tched network where best effort |s the the accepted modus operand|,
us|ng |t as the transport med|um acts as a ||m|ter. W|th the lnternet there are no hard guarantees
of cons|stent packet speed or order|ng end to end. And wh||e there are research |deas on how to
|mprove on known prob|ems, such as "m|dd|e m||e" congest|on [25|, the state of the lnternet today
|s a network bu||t for ready g|oba| access at affordab|e cost.
Th|s mode| stands |n contrast to other networks, such as te|ephony where a dropped or no|sy s|gna|
cannot be read||y ame||orated by packet reorder|ng on the rece|v|ng end. Therefore, the fde||ty of
the c|rcu|t |tse|f, both from the perspect|ve of throughput cons|stency and s|gna| |oss, |s p|aced at
greater prem|um.
What th|s demonstrates |s that the cho|ce of transport med|um has cons|derab|e |mpact on the set
of des|gn mode|s that a d|str|buted system can use to so|ve the concurrency prob|em.
12
W|th object stores there are two ma|n d|mens|ons: keep|ng phys|ca||y connected but nonethe|ess
d|screte nodes concurrent, and expand|ng th|s to ||kew|se work w|th nodes that are geograph|ca||y
d|spersed. The former can be accomp||shed w|th some form of m|dd|eware that acts as re||ab|e
transport even when |ayered on top of a transport where no guarantee of re||ab|e message order|ng
|s prov|ded. H|tach| Oontent P|atform accomp||shes th|s through the creat|on of a re||ab|e messag|ng
system |ayered on top of the TOP/lP backbone that connects the phys|ca| nodes that const|tute
the c|uster. Th|s, coup|ed w|th |ock|ng semant|cs among the var|ous |nterna| software subsystems,
a||ows for guaranteed concurrency regard|ess of wh|ch node |s serv|c|ng a part|cu|ar read request.
Expand|ng th|s to a |og|ca| c|uster d|spersed across d|stant geograph|es requ|res the bas|c tradeoff
of pay|ng a prem|um for pr|vate network connect|ons that come w|th fde||ty guarantees s|m||ar
to those seen |n te|ephony, or to trade cons|stency for t|me. ln the |atter case, |f a c||ent requ|res
synchronous concurrency, |.e. a|| nodes |n the ent|re |og|ca| c|uster that house a copy of the user
data are a|| cons|stent at the same |nstant, the tradeoff |s potent|a||y |ong de|ays before a wr|te
success acknow|edgment (AOK} can be returned to the c||ent. Many c||ents are not des|gned
for such var|ab|e and potent|a||y |ong de|ays |n rece|v|ng a wr|te AOK and w||| t|me out under the
assumpt|on that an error must have occurred somewhere |n the system.
The a|ternat|ve |s to return a success AOK to the c||ent upon the frst successfu| wr|te of the object
and then |eave |t to the d|str|buted object store to asynchronous|y ensure that a cons|stent v|ew of
the object |s he|d throughout the ent|re object store. Wh||e th|s |atter mode| |ntroduces a |eve| of r|sk
and uncerta|nty to the c||ent, |t |s genera||y the more econom|ca| method and can prove suffc|ent for
app||cat|ons that are un||ke|y to s|mu|taneous|y access an object |mmed|ate|y after |ts |n|t|a| |ngest.
How a part|cu|ar |mp|ementat|on of an object store so|ves the two b|g prob|ems of sca|e and
concurrency has a substant|a| |mpact to the customer. S|nce |t |s comparat|ve|y easy to so|ve these
prob|ems at very |ow object counts, |t can be espec|a||y cha||eng|ng for the consumer of an object
store to make an |nformed purchas|ng dec|s|on as the prob|ems that ar|se from these system
des|gn tradeoffs may not man|fest themse|ves unt|| the system has been |n use for a |ong t|me and a
s|gn|fcant object count has accumu|ated.
DOS Characteristics
Common Characteristics
At the conceptua| |eve| a|| object stores share certa|n s|m||ar|t|es; they d|ffer |n the part|cu|ar means
of |mp|ement|ng these core funct|ons, wh|ch resu|ts |n d|ffer|ng upper bounds for performance and
object count.
There's a common set of des|red character|st|cs that a present-day enterpr|se c|ass object store |s
expected to have. The m|n|mum set |nc|udes the ab|||ty to:
?
Grow capac|ty as needed.
?
T|ght|y coup|e data and metadata.
?
Present a g|oba| name space to the c||ent.
?
Dea| w|th a t|me hor|zon that sh|fts to decades.
?
Be equa||y access|b|e over the |AN and WAN.
13
The fo||ow|ng sect|ons d|scuss these fve key character|st|cs.
Capacity on Demand
Oonceptua||y th|s |dea| |s s|mp|e: customers wou|d ||ke to purchase a system that can seam|ess|y
grow capac|ty on an as-needed bas|s. ln pract|ce, bu||d|ng such a system presents many
cha||enges. The most obv|ous |s that the hardware systems (computes and storage} w||| change
over t|me. Marry|ng the o|d to the new requ|res at a m|n|mum that the phys|ca| form factors are
compat|b|e. Peop|e who have owned more than one |aptop |n the|r ||ves know that just gett|ng two
that use the same power supp|y and adapter |s d|ffcu|t. When you mu|t|p|y th|s prob|em to extend
to the fu|| comp|ement of mechan|ca| and e|ectr|ca| e|ements that must coex|st w|th|n the same
systems, |t |s easy to see how th|s prob|em grows geometr|ca||y.
The second, and even greater cha||enge, |s des|gn|ng a software system capab|e of seam|ess
capac|ty add|t|ons. Object stores typ|ca||y mask the deta|| of confgur|ng the back-end storage
subsystem from the system adm|n|strator. Th|s proves usefu| |n he|p|ng keep system management
costs down and thus |owers the tota| cost of ownersh|p. However, th|s a|so means that the object
store must cont|nue to offer th|s same ab|||ty even as the under|y|ng storage subsystems change.
Because these systems w||| a|| operate |n d|fferent manners, the cha||enge to the object store |s to
be ab|e to know the character|st|cs of each back-end data store and seam|ess|y operate across
d|fferent generat|ons.
Th|s causes a s|gn|fcant amount of process|ng overhead, as each dev|ce w||| not on|y have
d|fferent confgurat|on parameters but a|so w||| requ|re d|fferent access methods to ach|eve opt|ma|
performance. lt |s up to the object store to keep track of these deta||s and a|ter the way data
access |s confgured and rea||zed across the spectrum of back-end stores. The resu|t |s that a
s|ng|e a|gor|thm cannot be used un|versa||y for a|| data access methods, as |t can produce |nfer|or
performance resu|ts at best and s|mp|e fa||ure at worst.
The net resu|t |s that the var|ab|es |nvo|ved w|th add|ng capac|ty on demand must be taken |nto
account dur|ng the |n|t|a| des|gn of the system or the object store w||| become |ncreas|ng|y br|tt|e as
new generat|ons of capac|ty are added over t|me.
Data and Metadata
A s|gn|fcant beneft of an object store over trad|t|ona| storage |s the ab|||ty to coup|e an arb|trary set
of app||cat|on- and system-defned metadata w|th the or|g|na| data set (|n the D|str|buted Object
Store sect|on}. Th|s a||ows an ent|re|y new set of funct|ons to be taken aga|nst objects not on|y at
|ngest, but throughout the|r ||fe |n the object store as metadata presents a means for s|gn|fcant|y
expand|ng the va|ue of the data.
ln genera| the she|f ||fe of app||cat|ons w||| be |ess than that of the data stored. Therefore, |t can be
cr|t|ca| that |nformat|on that |dent|fes the c||ent (e.g., the rev|s|ons of the app||cat|on and assoc|ated
software} |s stored |n the metadata so that the user knows wh|ch vers|on of the app||cat|on |s
needed to actua||y use the data when |t |s eventua||y retr|eved. Otherw|se, the va|ue of the data
qu|ck|y approaches zero. Further, metadata prov|des an easy mechan|sm for cons|stency s|nce
a|| attr|butes about the data can be stored |n a s|ng|e p|ace read||y access|b|e, |.e. |n the metadata
assoc|ated w|th the object.
14
The cha||enge for the object store |s to a||ow r|ch metadata that can grow and be a|tered post
|ngest, and to a|ways be ab|e to reta|n the coup||ng of the metadata w|th the object data for the
ent|re ||fe of the object.
Global Namespace
Another s|gn|fcant va|ue of an object store |s that |t presents to the c||ents a s|ng|e g|oba|
namespace. Th|s unburdens c||ent app||cat|ons from the need to keep track of where data |s stored
|n perpetu|ty, wh|ch not on|y s|mp||fes the c||ent storage |og|c, but a|so has the s|de effect of mak|ng
app||cat|ons more res|||ent to changes |n the data center [20|.
Time Horizon
Trad|t|ona| sp|nn|ng-d|sk storage so|ut|ons dea| w|th a data ||fe measured |n months or years. An
object store by contrast can be expected to dea| w|th a data ||fe measured |n decades. The ||fe of
the objects may be d|ctated by regu|at|on, or the objects may s|mp|y be expected to a|ways be
present as a matter of course.
Th|s |ncrease |n ||fe expectancy makes cont|nua| and automat|c checks of the verac|ty of the stored
data a must. A|| storage med|a w||| exper|ence |rrevocab|e data |oss g|ven enough t|me. Th|s va|ue,
measured as mean t|me to data |oss (MTTD|}, var|es by med|um and usage pattern. However, the
common character|st|c |s that for a|| med|a, MTTD| never equa|s zero. Therefore, |t |s |ncumbent
upon the object store to act|ve|y check and repa|r data objects that have become corrupted for
whatever reason. Typ|ca||y, object stores w||| use some comb|nat|on of hash|ng and/or d|rect b|nary
compar|sons to guarantee that the stored data |s the data actua||y returned to the c||ent [19|.
Accessibility
A we||-constructed object store w||| be equa||y access|b|e by |AN or WAN. Th|s |mp||es that
trad|t|ona| |AN-based protoco|s such as OlFS and NFS are |nsuffc|ent as the so|e access
mechan|sm. Wh||e these protoco|s suffce for a |AN, they are too chatty for |ong d|stance
commun|cat|ons. Present|y, the HTTP protoco| |s the ||ngua franca of the lnternet, wh|ch accounts
for |ts preva|ence |n lnternet-based c|oud storage systems today. However, an object store shou|d
ant|c|pate that the access protoco| w||| change over t|me and be des|gned to accommodate a swap
of access protoco| just as |t must accommodate a change |n storage med|um.
Differentiation
The D|str|buted Object Store sect|on noted that |t can be d|ffcu|t for an object store consumer
to know how we|| a system w||| sca|e or reta|n a cons|stent v|ew across the c|uster unt|| after the
system has been |n use for a substant|a| per|od of t|me. ln th|s sect|on, we ||st four areas that are
read||y apparent for var|ous |mp|ementat|ons of an object store, and therefore do not have the
hand|cap of be|ng seen on|y after the system has been runn|ng for a |ong t|me.
Degree of Abstraction
An object store has the potent|a| to abstract away the deta|| of the storage, wh|ch prov|des
substant|a| beneft to both the app||cat|on and the customer. There are two ma|n areas that can be
abstracted:
15
?
The ab|||ty to homogen|ze var|ous back-end storage subsystems
?
The ab|||ty to a||ow arb|trary metadata to grow qu|te |arge
The extent to wh|ch the object store can d|st||| away a|| d|st|nct|on of the back-end data store has
a substant|a| |mpact on the |ong term usab|||ty of the system. As noted |n the lntroduct|on, HDD
dens|ty |mprovements ec||pse even those of processors. Therefore, a system w|th a t|me hor|zon of
decades must prov|de a near-perfect |eve| of storage abstract|on, or c||ent app||cat|ons w||| need to
change to effc|ent|y use new storage systems and methods of data store. For examp|e, |f an object
store uses |oca| d|sk w|th|n the compute nodes themse|ves, |t must present no change to the c||ent
|f the system |s |ater swapped out to use more advanced storage subsystems.
Metadata |s at the heart of any we|| constructed object store. The extent to wh|ch the c||ent and
system adm|n|strator are ab|e to add un||m|ted custom metadata |s a mark of the ut|||ty of the object
store as metadata can often grow much |arger than the data |tse|f. Systems that ||m|t metadata
s|ze and type |nvar|ab|y ||m|t the set and type of app||cat|ons that can beneft from the object store.
Add|t|ona||y, |t |s usefu| to a||ow changes to metadata over t|me to extend the usefu|ness of the
system. For examp|e, a funct|ona| MRl (fMRl} scan w||| not change, but |t |s va|uab|e to be ab|e to
update the metadata assoc|ated w|th the fMRl to |nd|cate the progress of the pat|ent, such as w|th a
record of who the pat|ent has seen, whether he's had surgery, etc.
Discernible Namespace
Objects stores tend to present e|ther a comp|ete|y opaque namespace, such as |n the case of OAS
systems that use a hash of the object as the on|y hand|e, or a namespace that |s traversab|e and
human readab|e, such as through the presentat|on of a f|e system facade.
The va|ue of the former |s that there can be on|y one hand|e for an object regard|ess of where |t
ex|sts |n the object store, assum|ng that hash co|||s|ons are not a factor. The d|sadvantage |s that
such a mode| makes |t d|ffcu|t, |f not |mposs|b|e, for e|ther the c||ent app||cat|on or the system
adm|n|strator to trace objects stored |n the c|uster. There's a certa|n trust factor that comes |nto p|ay
when us|ng a system that prov|des zero v|s|b|||ty |nto objects once they're |ngested |nto the object
store. Th|s weakness |s removed when |nstead the object store presents a traversab|e f|e system
facade. ln th|s case both c||ent app||cat|on and human users can read||y see the objects that are
present on the c|uster.
Degree of Freedom
Wh||e there are act|ve efforts to create a standard |nterface for object stores [22|, today most object
stores, whether |n the c|oud or enterpr|se, create a pr|vate APl that c||ent app||cat|ons must wr|te
to |n order to make use of the object store. The degree to wh|ch th|s APl |s un|que to one and on|y
one vendor |s the degree to wh|ch customers are |ocked |n to that vendor's offer|ng. Wh||e such
"st|ck|ness" |s advantageous to the vendor, |t |s equa||y ||m|t|ng to the customer. To the extent that an
object store does not requ|re a propr|etary APl or a||ows access through mu|t|p|e standard protoco|s
such as OlFS and NFS, the customer has a greater degree of freedom to change vendors as they
des|re.
Data Protection
The bas|c tradeoff for data protect|on |s cod|ng comp|ex|ty versus storage effc|ency. The eas|est
16
system to bu||d w||| s|mp|y create n c|ones of the or|g|na| object. The d|sadvantage of th|s mode|
|s that |t |s space |neffc|ent; |.e., at best a customer can use on|y 50% of the raw capac|ty. More
space-effc|ent means, such as RAlD, have the beneft of a||ow|ng the customer to use more of the
capac|ty purchased, but are harder to |mp|ement.
ldea||y the object store w||| a||ow customers to se|ect the data protect|on mode| that best su|ts the|r
needs and runs equa||y we|| regard|ess of the data protect|on se|ected. To do th|s we|| typ|ca||y
requ|res the use of a soph|st|cated back-end storage subsystem.
Wh||e there are means to ga|n greater storage effc|ency w|thout trad|t|ona| RAlD storage
subsystems, such as erasure encod|ng [27|, these methods are st||| re|at|ve|y nove| and therefore
have not undergone the r|gor of extens|ve customer use typ|fed by RAlD mode|s. lf the object store
|s to be used as the fna| home for object data, the means of data protect|on must be so||d or |t r|sks
data |oss.
Futures
Rise of Intelligent Storage
The D|str|buted Object Store sect|on |ntroduced the not|on that an object store |s |n part a move
from what |s trad|t|ona||y |abe|ed as "dumb storage" to an |nte|||gent system, capab|e of perform|ng
arb|trar||y comp|ex funct|ons aga|nst the data set. Th|s represents a s|gn|fcant sh|ft |n the storage
|ndustry. Prev|ous|y, the |nte|||gence was |eft to the app||cat|on and the scope of the storage
subsystem was constr|cted to concerns such as data protect|on mode|s. lt |s th|s sh|ft to |nte|||gent
storage that has marked the method se|ected by new entrants to c|oud storage. lt |s not surpr|s|ng
that the frst to embrace |nte|||gent storage are those outs|de the ma|nstream storage |ndustry, as
new entrants don't have ex|st|ng storage ||nes that cou|d be cann|ba||zed as a resu|t.
Granularity
State of the art today |s for the scope of the |nte|||gence to be w|th the object store |tse|f, and th|s |s
work|ng we|| w|th object counts that number |n the hundreds of m||||ons to about a b||||on. However,
the sca|e |ssue w||| on|y grow worse. lt |s a |ot eas|er to store a petabyte than |t |s to store a b||||on
objects.
To make that s|gn|fcant next jump |n sca|e w||| ||ke|y requ|re that the |nte|||gence be |ngra|ned |n the
objects themse|ves. ln such a mode| |nd|v|dua| objects wou|d have the "DNA" to know when to
create c|ones of themse|ves and how to adjust to changes |n env|ronment. For examp|e, |n the case
of a rush of read requests |n a part|cu|ar geography, objects wou|d be c|oned and m|grate to the hot
spot to serv|ce requests |oca||y. Once read act|v|ty subs|ded, objects wou|d know to d|e off as there
wou|d no |onger be a need for such a |arge popu|at|on.
Extreme Scale
We can use other systems w|th very |arge sca|e as a means of compar|son, such as the human
organ|sm, wh|ch conta|ns tens of tr||||ons of ce||s. The human organ|sm cou|dn't operate at such
sca|e |f |t were bounded by the ||m|t of a master contro| program that acted as gatekeeper to a||
ce||u|ar act|v|ty. lnstead, the human organ|sm |s contro||ed by a set of autonom|c funct|ons that
operate |ndependent|y of consc|ous thought and thus can perform the myr|ad funct|ons necessary
to keep such a comp|ex of ce||s operat|ng as a s|ng|e un|t. lt |s not hard to |mag|ne that to ach|eve
17
extreme sca|e |n the tens of tr||||ons, that |nte|||gent object stores w||| ||kew|se need to push down
some of the |nte|||gence to the objects themse|ves, thus creat|ng "|nte|||gent objects."
ln med|c|ne s|m||ar |deas ex|st |n the research commun|ty: for examp|e, cons|der the use of nano-
med|c|ne as a nove| way of target|ng cancer by v|ew|ng the human organ|sm as a system of |nter-
act|ng mo|ecu|ar networks and target|ng d|srupt|ons |n the system w|th nanosca|e techno|og|es [12|.
Oonceptua||y, s|m||ar research |s occurr|ng |n the computer sc|ence rea|m w|th prote|n-based
computers as future rep|acements for ex|st|ng s|||con-based systems, mode||ng the comp|ex
prote|n-s|gna||ng networks that sense a ce||'s chem|ca| state and respond appropr|ate|y [16|. Here
too the |dea |s that to get to that next |eve| of extreme sca|e means a sh|ft away from the |deas of
centra| process|ng un|ts |n hardware, or master contro| programs of some var|ety |n software, to a
much more granu|ar |eve| of act|ons and know|edge at the |nd|v|dua| object |eve|.
Perhaps the best contemporary examp|e we see of a very |arge-sca|e system that d|str|butes
|nte|||gence to the nodes and operates w|thout a master contro| program |s the lnternet |tse|f. lt |s
easy to understand that a system of th|s sca|e cou|dn't operate w|thout d|str|but|ng |nte|||gence.
Beyond Archive
For trad|t|ona| storage vendors there are two near term cha||enges:
1. F|gur|ng out how to pos|t|on object stores
2. Mov|ng beyond the arch|ve
Wh||e ear|y entrants |n c|oud storage don't need to contend w|th how to pos|t|on an object store
aga|nst other ||nes of storage, such |s not the case w|th storage vendors themse|ves. lt |s a|ways
a cha||enge fgur|ng out how to s|ot |n new techno|og|es |n a way that |s easy to exp|a|n to a g|oba|
sa|es force wh||e not cann|ba||z|ng sa|es of other ||nes. The method of cho|ce thus far has been to
art|fc|a||y character|ze object stores as su|tab|e on|y for arch|ve data, |eav|ng ex|st|ng ||nes to hand|e
other types of data. Wh||e th|s may so|ve a near-term prob|em of product pos|t|on|ng, |t fa||s to take
fu|| advantage of the broad set of funct|ons that an object store |s capab|e of perform|ng.
Th|s works we|| so |ong as everyone agrees to p|ay by the same ru|es, wh|ch to date has been more
or |ess true w|th trad|t|ona| storage vendors. However, the new entrants |n c|oud storage aren't
constra|n|ng themse|ves |n th|s manner and therefore are expand|ng the use case for object stores
beyond just arch|ve. The cha||enge for ex|st|ng storage vendors |s to see that the rea| compet|t|on |n
the 21st century may not come from the same compet|tors of the |ast decade - a |esson that the
now defunct m|n|computer manufacturers were never ab|e to fu||y |earn when the m|crocomputer
took over |n the 1990s [3|.
Summary
ln th|s paper we descr|bed the fundamenta| pr|nc|p|es of operat|on of d|str|buted object stores |n
genera|, w|th a focus on the var|ous cha||enges system des|gners face and the assoc|ated tradeoffs
|nherent to des|gn se|ect|on. We then reduced these genera| concepts to spec|fc examp|es of
the cha||enge of creat|ng an object store that |s tru|y sca|ab|e wh||e rema|n|ng coherent. F|na||y, we
conc|uded w|th thoughts on how such systems must evo|ve to make the next b|g step |n system
sca|e and operat|on, thereby extend|ng the market opportun|ty.
18
Appendix A
Acknowledgments
l wou|d ||ke to thank the fo||ow|ng rev|ewers: Oar| D'Ha||u|n, Jonathan Oh|n|tz, John D|cker, John
H||||ar, Scott Nyman and Scan Putegnat.
Author
Robert Pr|mmer works at H|tach| Data Systems |n the G|oba| So|ut|ons Strategy and Deve|opment
d|v|s|on as Sr. Techno|og|st and Sr. D|rector of Oontent Serv|ces Product Management, where he
works on the H|tach| Oontent P|atform (http://www.hds.com/products/storage-systems/content-
p|atform/|ndex.htm|}, wh|ch |s a d|str|buted object store. Pr|or to H|tach| Data Systems, he worked
on the Oentera and Atmos object stores at EMO. He |s a member of the AOM, lEEE and lEEE
Oomputer Soc|ety.
References
[1| A. Grunbacher, "POSl× Access Oontro| ||sts on ||nux," SuSE |abs, s.a.
[2| A. Thomas|an, M. B|aum, "H|gher Re||ab|||ty Redundant Arrays: Organ|zat|on, Operat|on and
Ood|ng," AOM Transact|ons on Storage, vo| 5., No. 3, Art|c|e 7, November 2009.
[3| O. Ohr|stensen, "The lnnovator's D||emma: The Revo|ut|onary Book that W||| Ohange the Way
You Do Bus|ness," HarperBus|ness, 2000.
[4| O. Wa|ter, "Kryder's |aw," Sc|ent|fc Amer|can, pp. 32-33, August 2005.
[5| O. Wa|ter, "|etters," Sc|ent|fc Amer|can, p. 14, December 2005.
[6| D. Josephy, "Med|c|ne's Next B|g Batt|efe|d: Your Home," Bus|nessWeek, Apr|| 27, 2009.
[7| G. Ohock|er, et a|., "Re||ab|e D|str|buted Storage," lEEE Oomputer, pp. 60-67, Apr|| 2009.
[8| H. S|rk|n, "S|ow Economy, Advanc|ng at Warp Speed," Bus|nessWeek, September 8, 2009.
[9| lDO, "The D|verse and Exp|od|ng D|g|ta| Ün|verse," March 2008.
[10| J. Garr|sson, A. |. Narash|ma Reddy, "Ümbre||a F||e System: Storage Management across
Heterogeneous Dev|ces," AOM Transact|ons on Storage, vo|. 5, No. 1, Art|c|e 3, March 2009.
[11| J. Gray, "The Transact|on Ooncept: v|rtues and ||m|tat|ons," Proceed|ngs of Seventh
lnternat|ona| Oonference on very |arge Databases, September 1981.
[12| J. Heath, M. Dav|s, |. Hood, "Nanomed|c|ne Targets Oancer," Sc|ent|fc Amer|can, pp. 44-51,
February 2009.
[13| J. van Bogart, "What can go wrong w|th magnet|c med|a," Pub||sh|ng Research Ouarter|y, vo|.
12, No. 4, pp. 65-77, December 1996.
[14| M. Ahmetov|c, et a|., "Opt|ca| Storage Med|a lndustry Ana|ys|s," Opt|ca| Storage Med|a lndustry
Report-1, s.a.
19
[15| M. Ratner, "H|tach| Oontent Arch|ve P|atform: Arch|tecture Overv|ew and lnterface Performance
vers|on 2.6, Arch|tecture Gu|de and Performance Br|ef," June 2009.
[16| N. Ramakr|shnan, Ü. Bha||a, J. Tyson, "Oomput|ng w|th Prote|ns," lEEE Oomputer, pp. 47-56,
January 2009.
[17| P. v|x|e, "What DNS ls Not," Oommun|cat|ons of the AOM, vo|. 52, No. 12, pp. 53-47, December
2009.
[18| R. F|e|d|ng, "Arch|tectura| Sty|es and the Des|gn of Network-based Software Arch|tectures," PhD
Thes|s, Ün|vers|ty of Oa||forn|a, lrv|ne, 2000.
[19| R. Pr|mmer, O. D'Ha||u|n, "Oo|||s|on and Pre|mage Res|stance of the Oentera Oontent Address,"
June 2005.
[20| R. Pr|mmer, "Effc|ent |ong-Term Data Storage Üt|||z|ng Object Abstract|on w|th Oontent
Address|ng," Ju|y 2003.
[21| R. Weber, "lnformat|on Techno|ogy - SOSl Object-Based Storage Dev|ce Oommands - 2 (OSD-
2}," Rev|s|on 4, Ju|y 2008.
[22| SNlA, "O|oud Data Management lnterface," vers|on 1.0g, February 9, 2010.
[23| S. Ou|n|an, S. Dorwards, "vent|: A new approach to arch|va| storage," Üsen|x Oonference on F||e
and Storage Techno|og|es, 2002.
[24| S. Rhea, et a|., "Fast, lnexpens|ve Oontent-Addressed Storage |n Foundat|on," Proceed|ngs of
the 2008 ÜSENl× Annua| Techn|ca| Oonference, 2008.
[25| T. |e|ghton, "lmprov|ng Performance on the lnternet," Oommun|cat|ons of the AOM, vo|. 51, No.
2, February 2009, pp. 45-51.
[26| ÜS House of Representat|ves, "Oonference Report on H.R. 1, Amer|can Recovery and
Re|nvestment Act of 2009", February 12, 2009, p. H1337.
[27| v. Guruswam|, A. Rudra, "Error Oorrect|on up to the lnformat|on-Theoret|c ||m|t,"
Oommun|cat|ons of the AOM, vo|. 52, No. 3, pp. 87-95, March 2009.
[28| v. Henson, "The code monkey's gu|de to cryptograph|c hashes for content-based address|ng,"
||nuxWor|d, November 12, 2007.
[29| v. vrab|e, S. Savage, G. voe|ker, "Oumu|us: F||esystem Backup to the O|oud," AOM Transact|ons
on Storage, vo| 5., No. 4., Art|c|e 14, December 2009.
[30| W. voge|s, "Eventua||y Oons|stent," Oommun|cat|ons of the AOM, vo|. 52, No. 1, pp. 41-44,
January 2009.
Corporate Headquarters
750 Central Expressway
Santa Clara, California 95050-2627 USA
www.hds.com
Regional Contact Information
Americas: +1 408 970 1000 or [email protected]
Europe, Middle East and Africa: +44 (0) 1753 618000 or [email protected]
Asia Paci?c: +852 3189 7900 or [email protected]
Hitachi is a registered trademark of Hitachi, Ltd., in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd., in the United
States and other countries.
All other trademarks, service marks and company names in this document or website are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by
Hitachi Data Systems Corporation.
© Hitachi Data Systems Corporation 2010. All Rights Reserved. WP-376-A DG July 2010

doc_193904739.pdf
 

Attachments

Back
Top