dc.description.abstract | Since the turn of the twenty-first century, the evidence overwhelming reveals that the
rate of increase for the amount of data we collect doubles each 12–14 months
(Kryder’s law). The growth momentum of the volume and complexity of digital
information we gather far outpaces the corresponding increase of computational
power, which doubles each 18 months (Moore’s law). There is a substantial imbalance
between the increase of data inflow and the corresponding computational
infrastructure intended to process that data. This calls into question our ability to
extract valuable information and actionable knowledge from the mountains of digital
information we collect. Nowadays, it is very common for researchers to work with
petabytes (PB) of data, 1PB ¼ 1015 bytes, which may include nonhomologous
records that demand unconventional analytics. For comparison, the Milky Way
Galaxy has approximately 2 1011 stars. If each star represents a byte, then one
petabyte of data correspond to 5,000 Milky Way Galaxies.
This data storage-computing asymmetry leads to an explosion of innovative data
science methods and disruptive computational technologies that show promise to
provide effective (semi-intelligent) decision support systems. Designing, understanding
and validating such new techniques require deep within-discipline basic
science knowledge, transdisciplinary team-based scientific collaboration, openscientific
endeavors, and a blend of exploratory and confirmatory scientific discovery.
There is a pressing demand to bridge the widening gaps between the needs and
skills of practicing data scientists, advanced techniques introduced by theoreticians,
algorithms invented by computational scientists, models constructed by biosocial
investigators, network products and Internet of Things (IoT) services engineered by
software architects. | en_US |