It’s no secret that rocket .. err … data scientists are in short supply.
The explosion of data and the corresponding explosion of tools, and the
knock-on impacts of Moore’s and Metcalfe’s laws, is that there is more
data, more connections, and more technology to process it than ever. At last
year’s Hadoop World, there was a feeding frenzy for data scientists, which
only barely dwarfed demand for the more technically oriented data architects.
In English, that means:
Potential MacArthur Grant recipients who have a passion and insight for data,
the mathematical and statistical prowess for ginning up the algorithms, and
the artistry for painting the picture that all that data leads to. That’s
what we mean by data scientists. People who understand the platform side of
Big Data, a.k.a., data architect or data engineer.
The data architect side will be the more straightforw... (more)
As we’ve noted previously, the measure of success of an open source stack
is the degree to which the target remains intact. That either comes as part
of a captive open source project, where a vendor unilaterally open sources
their code (typically hosting the project) to promote adoption, or a
community model where a neutral industry body hosts the project and gains
support from a diverse cross section of vendors and advanced developers. In
that case, the goal is getting the formal standard to also become the de
facto standard.
The most successful open source projects are those t... (more)
Informatica is within a year or two of becoming a $1 billion company, and the
CEO’s stretch goal is to get to $3b.
Informatica has been on a decent tear. It’s had a string of roughly 30
consecutive growth quarters, growth over the last 6 years averaging 20%, and
2011 revenues nearing $800 million. Abbasi took charge back in 2004, lifting
Informatica out of its midlife crisis by ditching an abortive foray into
analytic applications, instead expanding from the company’s data
transformation roots to data integration.
Getting the company to its current level came largely through a seri... (more)
Of the 3 "V’s” of Big Data – volume, variety, velocity (we’d add
"Value” as the 4th V) – velocity has been the unsung ‘V.’ With the
spotlight on Hadoop, the popular image of Big Data is large petabyte data
stores of unstructured data (which are the first two V’s). While Big Data
has been thought of as large stores of data at rest, it can also be about
data in motion.
"Fast Data” refers to processes that require lower latencies than would
otherwise be possible with optimized disk-based storage. Fast Data is not a
single technology, but a spectrum of approaches that process data t... (more)
To date, Big Storage has been locked out of Big Data. It’s been all about
direct attached storage for several reasons. First, Advanced SQL players have
typically optimized architectures from data structure (using columnar),
unique compression algorithms, and liberal usage of caching to juice response
over hundreds of terabytes. For the NoSQL side, it’s been about cheap,
cheap, cheap along the Internet data center model: have lots of commodity
stuff and scale it out. Hadoop was engineered exactly for such an
architecture; rather than speed, it was optimized for sheer linear scale.... (more)