Abstracts
Chris Cera, CTO, Vuzit
Toby DiPasquale, Chief Architect, CubeTree
Andrew Kortina, Lead Engineer, tinydb.org
Ken Rimple, Software Architect, Chariot Solutions
Time/Location: 12:00-1:00, Room 106/107
Both pros and cons of this technology will be covered. The panel will discuss the impact of EC2 and S3, and how, in many cases, they replace the need for an internal data center or web host provider. We'll cover topics such as backup/recovery scenarios, how our panel members have handled outages, and strategies for debugging when your app is running in a 'black boxed' environment. We'll also address concerns about vendor lock-in.
Jeff Barr, Senior Evangelist, Amazon Web Services
Time/Location: 9:00-10:00, Room 106/107
Jeff will discuss the latest developments in Amazon's web services, illustrated with examples of how forward-thinking companies of all sizes are building cost-effective and highly-scalable web sites, search engines, research tools, data archives, and more. Jeff will explain how the Elastic Compute Cloud, the Simple Storage Service, and the other Amazon Web Services form the basis for a new and very powerful internet computing platform.
Prasad Chakka, Data Engineer, Facebook
Time/Location: 1:15-2:15, Room 105
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making existing commercial warehousing solutions become prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.
In this presentation, we present Hive, a open-source warehousing solution on top of Hadoop. Hive supports queries expressed in an SQL-like declarative language - (HiveQL), which are compiled into map-reduce jobs executed on Hadoop. The language includes a type system with support for primitive types like integers and strings, collections like lists and maps, and object-oriented data types like structures. Hive also includes a system catalog, metastore, containing schemas and statistics, which is useful in data exploration and query optimization. Hive provides a shell utility that can be used browse metadata and execute HiveQL queries. And, like existing warehousing solutions, Hive provides utilities to load data into the warehouse and extract data from it, making it a complete solution for standard extract-transform-load type of applications. Hive is an open-source solution and is available for general use.
Britt Crawford and Justin McCarthy, Co-founders, HiveDB Project, Senior Software Engineers, Cafepress.com
Time/Location: 2:30-3:30, Room 105
Massively scalable systems used to be the sole province of large corporations with acronyms for names and budgets the size of a small country's GDP, but with the explosion of user-created content sites scaling is now everybody's problem.
One of the main strategies for scaling databases is to partition the data across many servers. This is a simple enough idea but requires a good deal of expertise in systems design, programming, and database administration to execute successfully. Many horizontal partitioning systems have been built and put into production in corporate environments, but so far none of them have:
- Focused on high-throughput OLTP over OLAP workload
- Built atop a trusted data storage foundation with well-known administration procedures
- Released their system under an open source license
This session introduces HiveDB, an open source framework for partitioning MySQL that implements the core ideas behind horizontal partitioning in as clear and concise a way as possible.
Toby DiPasquale, Chief Architect, CubeTree
Time/Location: 10:15-11:15, Room 105
The world of massively parallel computing is still relatively new and is without many tools to aid us in our quest to efficiently use all of that hardware. One of the best tools available today is Hadoop, an open source framework built around the same methods that Google uses in its own world-class clusters. With Hadoop, you can quickly and easily create jobs that can keep hundreds or even thousands of machines busy processing very large amounts of data. Join me to hear about this new and exciting method of parallel computing.
Joe Gregorio, Developer Advocate, Google
Time/Location: 2:30-3:30, Room 106/107
An introduction to building applications in Google App Engine and techniques you can use to improve your application's performance when you surpass a simple application size. We'll discuss Python runtime tricks, various types of caching, dynamic module loading, and App Engine Python idioms. We will also cover common strategies for scaling web applications to millions of users.
MenTaLguY, Software consultant, writer, graphic artist, and speaker
Time/Location: 3:45-4:45, Room 105
Threads have become a popular way to exploit multicore systems, to simulate coroutines, and to perform asynchronous IO, in part because many programming environments have made them the most attractive alternative. Unfortunately, programs natively structured using threads tend to exhibit poor resource utilization, and use of threads encourages programming approaches which are subject to subtle bugs.
We will examine what threads are, the problems with their use, the available alternatives to the explicit use of threads, and the limitations of these alternatives. Concrete examples will be given using libraries like Intel's TBB, Java's util.concurrent, The Omnibus Concurrency Library, and Revactor.
Jeff Polakow, Researcher
Time/Location: 9:00-10:00, Room 105
An increasing number of commercial enterprises are starting to look seriously at Haskell, a higher-order, strongly typed, pure functional programming language. This talk motivates the increased industrial attention to Haskell by focussing on the advantages which Haskell brings to general programming tasks. I will talk about my experience using Haskell to design and implement the software infrastructure for a small trading group at Deutsche Bank. Most of the applications I write deal with such quotidian tasks as acquiring data from external sources, linking up related information from different sources, searching for specific patterns and making data available through a webserver. In addition to outlining my overall system architecture and highlighting some novel aspects of my implementation, I will discuss the various pros and cons, technical and otherwise, of using Haskell in a corporate environment.
Chris Richardson, Author, Consultant
Time/Location: 3:45-4:45, Room 106/107
Traditionally, computer hardware was a scarce, expensive resource. Running performance tests often meant scavenging for machines around the office. Today, however, things are different. With Amazon's EC2, a cluster of servers is now just a web service call away. In this presentation you will learn about design and implementation of Cloud Tools, which is a Groovy-based framework for deploying and testing Java EE applications on EC2. This framework provides a simple (internal) DSL for configuring a cluster (database + web container + apache), deploying a web application, and running performance tests using JMeter. You will learn about capabilities of EC2 and how to use it for development and deployment. We describe how we use Amazon S3 to work around EC2's lack of a persistent file system and avoid time-consuming uploads of WAR files.
Jonathan Rochelle, Group Product Manager, Google
Time/Location: 10:15-11:15, Room 106/107
Google spreadsheets is best known as a web-based spreadsheet creation and sharing application to be used directly in the browser. It is used for a very broad range of information sharing needs, giving anyone a simple way to create "table-based" data and make it accessible on the web. Add in the APIs, the Forms feature and the ability to create Gadgets - and Google spreadsheets becomes a very simple platform for developing fast, functional web site features and collaborative content management... This session will describe, in not-too-technical terms, the tactics available to create custom data collection apps and spreadsheet-based gadgets. We will take the group through the steps from simple, form-based data collection and customization of forms for web-masters, to data extraction techniques using feeds and creation of visualizations using spreadsheet Gadgets geared toward programmers. We will also show some public examples of things built using these tactics.
Ezra Zygmuntowicz, Merb framework creator, Co-founder of Engine Yard
Time/Location: 1:15-2:15, Room 106/107
In this talk we will cover how to 'think' in terms of cloud computing. What are the new challenges to overcome? We will discuss coordinating resources in the cloud at large with our Vertebra technology as well as coordinating your application in the cloud with RabbitMQ. Virtual Machines are the new threads/processes, we will cover how this can impact the way you design scalable applications meant to run in the cloud.
