A Survey of 3000 Executives Reveals How Businesses Succeed with AI – Harvard Business Review

A Survey of 3000 Executives Reveals How Businesses Succeed with AI – Harvard Business Review

And we expect at least a portion of current AI piloters to fully integrate AI in the near term. Telecom and financial services are poised to lead the way, with respondents in these sectors planning to increase their AI tech spend by more than 15% a year — seven percentage points higher than the cross-industry average — in the next three years. Believe the hype that AI can potentially boost your top and bottom line. Thirty percent of early AI adopters in our survey — those using AI at scale or in core processes — say they’ve achieved revenue increases, leveraging AI in efforts to gain market share or expand their products and services. While the question of correlation versus causation can be legitimately raised, a separate analysis uncovered some evidence that AI is already directly improving profits, with ROI on AI investment in the same range as associated digital technologies such as big data and advanced analytics. Survey respondents from firms that have successfully deployed an AI technology at scale tend to rate C-suite support as being nearly twice as high as those companies that have not adopted any AI technology. You don’t have to go it alone on AI — partner for capability and capacity. With the AI field recently picking up its pace of innovation after the decades-long “AI winter,” technical expertise and capabilities are in short supply. Our survey, in fact, showed that early AI adopters have primarily bought the right fit-for-purpose technology solutions, with only a minority of respondents both developing and implementing all AI solutions in-house. Resist the temptation to put technology teams solely in charge of AI initiatives. Compartmentalizing accountability for AI with functional leaders in IT, digital, or innovation can result in a hammer-in-search-of-a-nail outcome: technologies being launched without compelling use cases. To ensure a focus on the most valuable use cases, AI initiatives should be assessed and co-led by both business and technical leaders, an approach that has proved successful in the adoption of other digital technologies. Take a portfolio approach to accelerate your AI journey. AI tools today vary along a spectrum ranging from tools that have been proven to solve business problems (for example, pattern detection for predictive maintenance) to those with low awareness and currently-limited-but-high-potential utility (for example, application of AI to developing competitive strategy). Long-term: Work with academia or a third party to solve a high-impact use case (augmented human decision making in a key knowledge worker role, for example) with bleeding-edge AI technology to potentially capture a sizable first-mover advantage. Machine learning is a powerful tool, but it’s not right for everything. Machine learning and its most prominent subfield, deep learning, have attracted a lot of media attention and received a significant share of the financing that has been pouring into the AI universe, garnering nearly 60% of all investments from outside the industry in 2016. While it’s clear that CEOs need to consider AI’s business implications, the technology’s nascence in business settings makes it less clear how to profitably employ it. Through a study of AI that included a survey of 3,073 executives and 160 case studies across 14 sectors and 10 countries, and through a separate digital research program, we have identified 10 key insights CEOs need to know to embark on a successful AI journey. It’s critical to look for the right tool to solve each value-creating business problem at a particular stage in an organization’s digital and AI journey. Digital capabilities come before AI. We found that industries leading in AI adoption — such as high-tech, telecom, and automotive — are also the ones that are the most digitized. Using a battery of statistics, we found that the odds of generating profit from using AI are 50% higher for companies that have strong experience in digitization. Be bold. In a separate study on digital disruption, we found that adopting an offensive digital strategy was the most important factor in enabling incumbent companies to reverse the curse of digital disruption. The biggest challenges are people and processes. In many cases, the change-management challenges of incorporating AI into employee processes and decision making far outweigh technical AI implementation challenges. As leaders determine the tasks machines should handle, versus those that humans perform, both new and traditional, it will be critical to implement programs that allow for constant reskilling of the workforce. And as AI continues to converge with advanced visualization, collaboration, and design thinking, businesses will need to shift from a primary focus on process efficiency to a focus on decision management effectiveness, which will further require leaders to create a culture of continuous improvement and learning. Total investment (internal and external) in AI reached somewhere in the range of $26 billion to $39 billion in 2016, with external investment tripling since 2013. Despite this level of investment, however, AI adoption is in its infancy, with just 20% of our survey respondents using one or more AI technologies at scale or in a core part of their business, and only half of those using three or more. (Our results are weighted to reflect the relative economic importance of firms of different sizes. AI technologies like neural-based machine learning and natural language processing are beginning to mature and prove their value, quickly becoming centerpieces of AI technology suites among adopters.

Read More

AI Lets Astrophysicists Analyze Images 10 Million Times Faster

AI Lets Astrophysicists Analyze Images 10 Million Times Faster

In one clip, a life-size, solid-looking cartoon lion stands in a lobby, facing a real dog, with appropriate-looking shadows on the tile floor moving as the lion shifts. In another, a life-size scarecrow stands on a sidewalk in front of a very real taco truck, pondering the menu, fitting in quite well with the people standing behind him. That is a technology that Google first showed off in 2014, which uses a combination of sensors and computer vision to help phones figure out precisely where they are in 3-D space, even in the absence of GPS. ARCore is meant to work without such hardware additions, which means it could be added to apps that a lot more people will be able to use in the near future. In a blog post, Dave Burke, vice president of Android Engineering, writes that ARCore can do things like track a phone’s location and the direction in which it’s facing by using its camera and sensors (helpful for keeping virtual objects in the same spot), figure out where there are horizontal surfaces (upon which an app may want to place, say, a virtual cup of coffee), and pay attention to real-world light to help developers figure out the most realistic ways to display their virtual objects. An early version of ARCore is being released Tuesday, and in his post Burke says it will initially work with the company’s Pixel smartphone and Samsung’s Galaxy S8 handset, as long as they’re running the Nougat version of Android, or newer. (He writes that this would make ARCore capable of running on “millions” of devices off the bat, but really that’s just a small percentage of the more than two billion devices out there using Android.) Google’s ARCore follows similar work by Apple and Facebook, which both released developer tools earlier this year to try to make AR more popular among their users. There have been smartphone augmented reality apps available for Android and iOS for years, but none of them work all that well or look that good: virtual images tend to float awkwardly in space, rather than fitting in with real-world surroundings, and the software doesn’t cope well with things like changing lighting conditions. Even Pokémon Go, a smash hit when it was released in the summer of 2016, doesn’t do a great job of mixing virtual creatures with reality as you see it through your smartphone screen.

Read More

A Neural Network Attempted to Write the Next Game of Thrones Book

A Neural Network Attempted to Write the Next Game of Thrones Book

Martin's epic book series, A Song of Ice and Fire, have been eagerly awaiting the next installment—all while the hit HBO series "Game of Thrones" continues on its own narrative path. For example, this meandering detail takes place in the second chapter: "The dog wandered the stair, to allow the high officers to help you at home. the woods are gowned on bloody yellow and glass. It may be fewer as well as the north." Eerily, however, several of the predictions made by the bot for the next book mirror popular fan theories about what will happen to favorite characters, such as whether one will end up riding a dragon or another may get poisoned by a close adviser. And though oftentimes larger training sets are better for AI, the complexity of so many of these words complicated the training process, Hill reports. The website Inverse has joined forces with a San Francisco company Unanimous A.I. ​to makes predictions about what will happen in the final season of the GOT TV series, writes John Bonazzo for the New York Observer. So to help fill the void, a software engineer trained an artificial intelligence bot to write its own version of the forthcoming novel. Software engineer (and ASOIAF fan) Zack Thoutt was inspired to create the bot after taking a course on artificial intelligence, writes Sam Hill for Motherboard. He programmed the bot as a artificial neural network, a setup consisting of thousands of different data nodes that can work in tandem to process data. Unlike a computer that has to be programmed, neural networks can modify their responses over time using databases fed into the system, similar to learning. To train his bot to write the next ASOIAF sequel, Martin fed the neural network all 5,376 pages of the previous five books to give it a sense of the characters, places and writing style, reports Hill. For each chapter the AI generated, Thoutt gave the bot a word count and a so-called "prime word" to kick off the section.

Read More

NERSC Scales Scientific Deep Learning to 15 Petaflops

NERSC Scales Scientific Deep Learning to 15 Petaflops

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according to the authors of the paper (and to the best of their knowledge), currently the most scalable deep-learning implementation in the world. The work described in the paper, Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data1, reported that a Cray XC40 system with a configuration of 9,600 self-hosted 1.4GHz Intel Xeon Phi Processor 7250 based nodes achieved a peak rate between 11.73 and 15.07 petaflops (single-precision) and an average sustained performance of 11.41 to 13.47 petaflops when training on physics and climate based data sets using Lawrence Berkeley National Laboratory’s (Berkeley Lab) NERSC (National Energy Research Scientific Computing Center) Cori Phase-II supercomputer. Failure to converge to a good solution is also a possibility although Ioannis Mitliagkas, former Postdoctoral scholar at Stanford and currently Assistant Professor at the University of Montreal, observes from both classical and modern results on asynchrony that, “on well-behaved objectives, failure to converge implies a mis-tuned system.” Thus tuning is critical. Those updates are sent to a central parameter store called the parameter server (noted as PS in the figure), which applies the updates to the model in the order they are received. After each setup, the PS sends the new model, back to the worker where the update originated. Asynchronous systems do not suffer from straggler effects and are not limited by the total batch size in the same way that synchronous systems are, an important property at scale. Along with scalability, Joe Curley, Intel’s senior director of HPC platform and ecosystem enabling, highlights the scientific accomplishments this level of performance brings to deep learning researchers, and data-intensive scientific communities such as climate and High Energy Physics (HEP). Further, it make the training susceptible to jitter (e.g. the “straggler effect”) as the computation can be rate limited by the slowest node in the system during each iteration of the training procedure.3 The straggler effect occurs when any delay on any node exceeds the ability of the implementation of the reduction operation to hide latency. In addition, using too many nodes during training can reduce the number of examples per node (e.g. the mini-batch size) to the point of reduced node efficiency. In the hybrid approach, worker nodes coalesce into separate, synchronous compute groups where the workers split a mini-batch quantity of work among themselves to produce a single update to the model. He also pointed out how the results further establish the deep learning performance capability of the Intel Xeon Phi processor based computational nodes in the Cori supercomputer. “These were not just a set of heroic runs, they have solved real problems at the scale of a top five supercomputer using new methods,” Curley said. The authors report they observed better scaling for their hybrid asynchronous updates over synchronous configurations due to reduced straggler effects. “This has been a big engineering effort,” Mitliagkas notes, “On the Stanford side, this would not have been possible without the engineering skills and hard work of Jian Zhang.” He also points out that in large-scale HPC runs, say 10,000 nodes, the likelihood that there will be some ’slow’ nodes is significant. Results in their paper show the hybrid method performing 1.66x better than the best synchronous run, and about 10x better than the worst synchronous run. The weak scaling plots below, where the amount of work per node is kept constant, show that the scalability of the system can vary with the task. Recognizing this complexity, Thorsten Kurth, HPC consultant at NERSC, notes: “it is unreasonable to expect scientists to be conversant in the art of hyper-parameter tuning. Prabhat, Data and Analytics Group Lead at NERSC, Berkeley Lab, emphasized that this performance and scalability result was very much a collaborative effort that: (A) utilized a neural network update scheme by Christopher Ré’s group (Department of Computer Science at Stanford University), (B) created a software infrastructure by the Parallel Computing Lab, Intel MKL and Intel MLSL product teams at Intel, and (C) leveraged the world-class people and hardware resources at NERSC. Hybrid schemes, like the one presented in this paper, add an extra parameter to be tuned, which stresses the need for principled momentum tuning approaches, an active area of research (eg. The results presented in the paper were based on 32-bit, single-precision arithmetic because there are open questions regarding the use of reduced precision for training. Specifically, Thorsten observes, “more aggressive optimizations involving computing in low-precision and communicating high order bits of weight updates are poorly understood with regards to their implications for classification and regression accuracy for scientific datasets” [Italic emphasis by the authors]. Mitliagkas makes the point that the hybrid architecture means the user does not have to choose to run entirely in synchronous or asynchronous mode but can tune the hybrid method to best fit the machine and problem. Overall, Curley observes that the collaboration reported “reasonably good scaling performance” as the 9,600 node cluster delivered an approximate 7,205x speedup. (Perfect scaling would have delivered a 9,600x speedup.) Curly is excited by the potential of this early work stating, “Opportunities exist to improve performance and scaling in either future runs, or in the course of solving new problems. Nadathur Satish, Research Scientist at Intel, noted, “The Intel team performed a significant amount of work to extend the Intel MLSL library to support the hybrid asynchronous code for this paper.”  Specifically the Intel MLSL team added the ability to instantiate multiple synchronous groups and interface it with a parameter server. Prabhat notes, “For this paper, it was critical for us to demonstrate the viability of scaling Deep Learning for real scientific applications, in contrast to ImageNet. Thorsten Kurth and Wahid Bhimiji chose to demonstrate the efficacy of training on simulated HEP data at scale to learn how to separate the rare signals of new particles from background events – without human intervention. For the paper, the team used data from an LHC simulator to identify massive supersymmetric particles in multi-jet final states as they should appear in real-life at the LHC. The update scheme by Ré allows both the synchronous and asynchronous updates of the ANN (Artificial Neural Network) parameters. For the task of detecting extreme weather patterns, the team developed a novel semi-supervised architecture that uses an auto-encoder to capture various patterns in the dataset and simultaneously asks the network to predict bounding boxes for known patterns (such as hurricanes , extra-tropical cyclones and atmospheric rivers). Figure 6: Results from plotting the network’s most confident (>95%) box predictions on an image for integrated water vapor (TMQ) from the test set for the climate problem. The authors observe that the hardware efficiency of these kernels heavily depends on input data sizes and model parameters (weight matrix dimensions, number of convolutions, convolution strides, padding, and etcetera). As a reference, they note that DeepBench from Baidu captures the best known performance of deep learning kernels with varied input sizes and model parameters on NVIDIA GPUs and Intel Xeon Phi processors. DeepBench results show that performance on all architectures can be as high as 75-80% of peak flops for some kernels,  and as low as 20-30% as the minibatch size decreases (determined by dimension ’N’ for matrix multiply and convolutions). Conceptually asynchronous updates (and asynchronous architectures in general) provide the ability to scale to large numbers of nodes by removing synchronization barriers. With those caveats in place, the team reports that an Intel Xeon Phi processor 7250 can deliver a 2.09 teraflops overall flop rate for the climate network and 1.90 teraflops for the HEP network. Work by a number of researchers around the world is demonstrating both the usefulness, performance, and scalability of deep learning on very large, data intensive workloads. Succinctly, the more complex the problem, the larger the data set that is required to adequately represent the problem space during training. Further, this collaborative effort by Intel, NERSC, and Stanford shows that deep learning is a candidate exascale workload that can help realize the tremendous potential of computing at the exascale. 2 NERSC is a DOE Office of Science User Facility supported by the Office of Science of the U.S. This lock-free approach allows for faster model updates, but can require more updates to yield an equally good final model.  Thus asynchronous updates can make the training process run longer – meaning it can take longer to converge. 3 I recommend the paper “The case of the missing supercomputer performance” to better understand the impact of jitter at scale in HPC systems. The thought process behind the use of asynchronous updates is that the extra computational nodes add enough parallelism (and hence can deliver greater performance) to overcome the potentially slower convergence behavior and thus deliver an overall faster time-to-model.

Read More

Background removal with deep learning

Background removal with deep learning

We’ll be happy to hear thoughts and comments!Throughout the last few years in machine learning, I’ve always wanted to build real machine learning products.A few months ago, after taking the great Fast.AI deep learning course, it seemed like the stars aligned, and I have the opportunity: The advances in deep learning technology permitted doing many things that weren’t possible before, and new tools were developed and made the deployment process more accessible than ever.In the aforementioned course, I’ve met Alon Burg, who is an experienced web developer, an we’ve partnered up to pursue this goal. Unlike image classification or detection, segmentation model really shows some “understanding” of the images, in not only saying “there is a cat in this image”, but pointing where and what is the cat, on a pixel level.So how does the segmentation work? To better understand, we will have to examine some of the early works in this field.The earliest idea was to adopt some of the early classification networks such as VGG and Alexnet. VGG was the state of the art model back in 2014 for image classification, and is very useful nowadays because of it’s simple and straightforward architecture. With these understandings in mind, it was hypothesized that classification training can also be used with some tweaks to finding/segmenting the object.Early results for semantic segmentation emerged along with the classification algorithms. In this post, you can see some rough segmentation results that come from using the VGG:late layer results:Segmentation of the buss image, light purple (29) is school bus classafter bilinear upsampling:These results comes from merely converting (or maintaining) the fully connected layer into it’s original shape, maintaining its spatial features, getting a fully convolutional network. In the example above, we feed a 768*1024 image into the VGG, and get a layer of 24*32*1000. the 24*32 is the pooled version of the image (by 32) and the 1000 is the image-net class count, from which we can derive the segmentation above.To smooth the prediction, the researchers used a naive bilienar upsampling layer.In the FCN paper, the researchers improved the idea above. They connected some layers along the way, to allow a richer interpretations, which were named FCN-32, FCN-16 and FCN-8, according the up-sampling rate:Adding some skip connections between the layers allowed the prediction to encode finer details from the original image. Together, we’ve set ourselves the following goals:Improving our deep learning skillsImproving our AI product deployment skillsMaking a useful product, with a market needHaving fun (for us and for our users)Sharing our experienceConsidering the above, we were exploring ideas which:Haven't been done yet (or haven't been done properly)Will be not too hard to plan and implement — our plan was 2–3 months of work, with a load of 1 weekly work day.Will have an easy and appealing user interface — we wanted to do a product that people will use, not only for demonstration purposes.Will have training data readily available — as any machine learning practitioner knows, sometimes the data is more expensive than the algorithm.Will use cutting edge deep learning techniques (which were still not commoditized by Google, Amazon and friends in their cloud platforms) but not too cutting edge (so we will be able to find some examples online)Will have the potential to achieve “production ready” result.Our early thoughts were to take on some medical project, since this field is very close to our hearts, and we felt (and still feel) that there is an enormous number of low hanging fruits for deep learning in the medical field. Further training improved the results even more.this technique showed itself as not so bad as might have been thought, and proved there is indeed potential in semantic segmentation with deep learning.FCN results from the papaerThe FCN unlocked the concept of segmentation, and researchers tried different architectures for this task. We also had some thoughts about the mask-RCNN, but implementing it seemed outside of our projects scope.FCN didn’t seem relevant since its results weren’t as good as we would have like (even as a starting point), but the 2 other models we’ve mentioned showed results that were not bad: the tiramisu on the CamVid dataset, and the Unet main advantage was it’s compactness and speed. I must say that after we first tried the Tiramisu, we saw that its results had much more potential for us, since it had the ability to capture sharp edges in an image. from the other hand, the unet seemed not fine enough, and the results seemed a bit blobbish.Unet blobbishnessAfter having our general direction set with the model, we started looking for proper datasets. The most common datasets for segmentation were the COCO dataset, which includes around 80K images with 90 categories, the VOC pascal dataset with 11K images and 20 classes, and the newer ADE20K datasets.We chose to work with the COCO dataset, since it includes much more images with the class “person” which was our class of interest.Considering our task, we pondered if we’ll use only the images that are super relevant for us, or use more general dataset. However, we realized that we are going to stumble upon issues with data collection and perhaps legality and regulation, which was a contradiction with our will to keep it simple. On one hand, using a more general dataset with more images and classes will allow the model to deal with more sceneries and challenges. If we’ll introduce the model with the entire COCO dataset, we will end with the model seeing each image twice (on average) therefore trimming it a little bit will be beneficial. additionally, it will results in a more focused model for our purposes.One more thing that is worth mentioning — the Tiramisu model was originally trained on the CamVid dataset, which has some flaws, but most importantly it’s images are very monotonous: all images are road pics from a car. As you can easily understand, learning from such dataset (even though it contains people) had no benefit for our task, so after a short trial, we moved ahead.Images from CamVid datasetThe COCO dataset ships with prety straight-forward API which allowed us to know exactly what objects are at each image (according th 90 predifiened classes)After some experimenting, we’ve decided to dilute the dataset: first we filtered only the images with a person in them, leaving us with 40K images. Finally, we left only the images where 20%-70% of the image are tagged as the person, removing the images with a very small person in the background, or some kind of weird monstrosity (unfortunately not all of them). Moreover, Tiramisu adds skip connections to the up-sampling layers, like the Unet.If you recall, this architecture is congruent with the idea presented in FCN: using classification architecture, up-sampling, and adding skip connections for refinement.Tiramisu Architecture in generalThe DenseNet model can be seen as a natural evolution of the Resnet model, but instead of “remembering” every layer only until the next layer, the Densenet remembers all layers throughout the model. Our second choice was a background removal product.Background removal is a task that is quite easy to do manually, or semi manually (Photoshop, and even Power Point has such tools) if you use some kind of a “marker” and edge detection, see here an example. You might expect 1600 layers because it’s 100 layer tiramisu, however, the up-sampling layers drop some filters.Densenet model sketch — early filters are stacked throughout the modelWe trained our model with schedule as described in the original paper: standard cross entropy loss, RMSProp optimizer with 1e-3 learning rate and small decay. This also allowed us to save the model periodically with every improvement in results, since we trained it on much more data (the CamVid dataset which was used in the article contains less than 1K images)Additionally, we trained it on only 2 classes: background and person, while the paper had 12 classes. We first tried to train on some of coco’s classes, however we saw that this doesn’t add to much to our training.some dataset flaws hindered our score:Animals — Our model sometimes segmented animals. this of course leads to a low IOU. adding animals to our task in the same main class or as anther, would probably removed our resultsBody parts — since we filtered our dataset programatically, we had no way to tell if the person class is actually a person or some body part like hand or foot. these images were not in our scope, but still emerged here and there.Animal, Body part, hend held objectHandheld Objects – many Images in the dataset are sports related. As in the animal case, adding them as part of the main class or as separate class would help the performance of the model in our opinion.Sporting image with an objectCoarse ground truth — the coco dataset was not annotated pixel by pixel, but with polygons. Sometimes it’s good enough, but other times the ground truth is very coarse, which possible hinders the model from learning subtletiesImage and (very) Coarse ground truthOur results were satisfying, tough not perfect: we have reached IoU of 84.6 on our test set, while current state of the art is 85. This number is tricky though: it fluctuates throughout different datasets and classes. there are classes which are inherently easier to segment e.g houses, roads, where most models easily reach results of 90 IoU. To gauge this difficulty, we helped our network focus on a single class, and limited type of photos.We still not feel our work is “production ready” as we would want it to be, but we think it’s a good time to stop and discuss our results, since around 50% of the photos will give good results.Here are some good examples to give you a feel of the app capabilties:Image, Ground truth, our result (from our test set)A very important part of training neural networks is the debugging. When starting our work, it was very tempting to get right to it, grab the data and the network, start the training, and see what comes out. However, we found out that it extremely important to track every move, and making tools for ourselves for being able to examine results at each step.Here are the common challenges, and what we did about them:Early problems — The model might not be training. Here is a good post about this subject.Debugging the network itself — after making sure there are no crucial issues, the training starts, with the predefined loss and metrics. This turned out to be an important question, since the more specific a model is in terms of objects, angle, etc. the higher quality the separation will be. When starting our work, we thought big: a general background remover that will automatically identify the foreground and background in every type of image. I must say we still haven't found the perfect method, except from fervently writing up our configurations (and auto-saving best models with keras callback, see below).Debugging tool — after doing all the above got us into a point the we can examine our work at every step, but not seamlessly. therefore, the most importent step was combining the steps above together, and creating an Jupyter notebook which allowed us to seamlessly load every model and every image, nad quickly examine it’s results. This way we could easily see differences between models, pitfalls and other issues.Here are and example of the improvement of our model, throughout tweaking of parameters and extra training:for saving model with best validation IoU until now: (Keras Provides a very nice callbacks to make these things easier)callbacks = [keras.callbacks.ModelCheckpoint(hist_model, verbose=1,save_best_only =True, monitor= ’val_IOU_calc_loss’), plot_losses]In addition to the normal debugging of possible code errors, we’ve noticed that model errors are “predictable”, like “cutting” body parts that seem out of the general body counter, “bites” on large segments, unnecessarily continuing extending body parts, poor lighting, poor quality, and many details. Now lets see some of our model difficulties:es:Cloths — very dark or very light clothing tends sometimes to be interpreted as bacground“Bites” — otherwise good results, had some bites in themClothing and bite3. lighting -poor lightning and obscurity is common in images, however not in COCO dataset. therefore, apart from the standard difficulty of models to deals with tese things, ours haven’t been even prepared fro harder images. We’ve reached these results very close to the release, therefore we haven't had the chance to apply the basic practice of data augmentation.We’ve trained the model after resizing the images to 224X224. Further training with more data and larger images (original size of COCO images is around 600X1000) would also expected to improve results.At some stages, we saw that our results are a bit noisy at the edges. In this blogpost, the author shows slightly naive example for using CRF.However, it wasn’t very useful for our work, perhaps since it generally helps when results are coarser.Even with our current results, the segmentation is not perfect. Here is an example of state of the art matting, published earlier this year in NVIDIA conference.Matting exmple — the imput includes the trimap as wellThe matting task is different from other image related tasks, since it’s input includes not only an image, but also a trimap — an outline of the edges of the images, what makes it a “semi supervised” problem.We experimented with matting a little bit, using our segmentation as the trimap, however we did not reach significant results.one more issue was the lack of a proper dataset to train on.As said in the beginning, our goal was to build a significant deep learning product. Therefore, we decided to focus on selfies and human portraits.Background removal of (almost) human portraitA selfie is an image with a salient and focused foreground (one or more “persons”) guarantees us a good separation between the object (face+upper body) and the background, along with quite an constant angle, and always the same object (person).With these assumptions in mind, we embarked on a journey of research, implementation and hours of training to create a one click easy to use background removal service.The main part of our work was training the model, but we couldn't underestimate the importance of proper deployment. Training a model on the other hand, is tricky — training, especially when done overnight, requires careful planning, debugging, and recording of results.It is also not easy to balance between research and trying new things, and the mundane training and improving. Since we use deep learning, we always have the feeling that the best model, or the exact model we need, is just around the corner, and another google search or article will lead us to it. But in practice, our actual improvements came from simply “squeezing” more and more from our original model. Good segmentation models are still not compact as the classification model (e.g SqueezeNet) and we actively examined both server and browser deployment options.If you want to read more details about the deployment process(es) of our product, your are welcomed to check out our posts on server side and client side.If you want to read about the model and it’s training process, keep going.When examining deep learning and computer vision tasks which resemble ours, it is easy to see that our best option is the semantic segmentation task.Other strategies, like separation by depth detection also exist, but didn’t seem ripe enough for our purposes.Semantic segmentation is a well known computer vision task, one of the top three, along with classification and object detection.

Read More

Artificial intelligence analyzes gravitational lenses 10 million times faster

Artificial intelligence analyzes gravitational lenses 10 million times faster

Credit: CC0 Public Domain Researchers from the Department of Energy's SLAC National Accelerator Laboratory and Stanford University have for the first time shown that neural networks – a form of artificial intelligence – can accurately analyze the complex distortions in spacetime known as gravitational lenses 10 million times faster than traditional methods. "Analyses that typically take weeks to months to complete, that require the input of experts and that are computationally demanding, can be done by neural nets within a fraction of a second, in a fully automated way and, in principle, on a cell phone's computer chip," said postdoctoral fellow Laurence Perreault Levasseur, a co-author of a study published today in Nature. The Large Synoptic Survey Telescope (LSST), for example, whose 3.2-gigapixel camera is currently under construction at SLAC, will provide unparalleled views of the universe and is expected to increase the number of known strong gravitational lenses from a few hundred today to tens of thousands. "We won't have enough people to analyze all these data in a timely manner with the traditional methods," Perreault Levasseur said. "Neural networks will help us identify interesting objects and analyze them quickly. This will give us more time to ask the right questions about the universe." A Revolutionary Approach Neural networks are inspired by the architecture of the human brain, in which a dense network of neurons quickly processes and analyzes information. Once the first layer has found a certain feature, it transmits the information to the next layer, which then searches for another feature within that feature, and so on. "The amazing thing is that neural networks learn by themselves what features to look for," said KIPAC staff scientist Phil Marshall, a co-author of the paper. "This is comparable to the way small children learn to recognize objects. You don't tell them exactly what a dog is; you just show them pictures of dogs." But in this case, Hezaveh said, "It's as if they not only picked photos of dogs from a pile of photos, but also returned information about the dogs' weight, height and age." Although the KIPAC scientists ran their tests on the Sherlock high-performance computing cluster at the Stanford Research Computing Center, they could have done their computations on a laptop or even on a cell phone, they said. In fact, one of the neural networks they tested was designed to work on iPhones. "Neural nets have been applied to astrophysical problems in the past with mixed outcomes," said KIPAC faculty member Roger Blandford, who was not a co-author on the paper. "But new algorithms combined with modern graphics processing units, or GPUs, can produce extremely fast and reliable results, as the gravitational lens problem tackled in this paper dramatically demonstrates. Lightning Fast Complex Analysis The team at the Kavli Institute for Particle Astrophysics and Cosmology (KIPAC), a joint institute of SLAC and Stanford, used neural networks to analyze images of strong gravitational lensing, where the image of a faraway galaxy is multiplied and distorted into rings and arcs by the gravity of a massive object, such as a galaxy cluster, that's closer to us. The distortions provide important clues about how mass is distributed in space and how that distribution changes over time – properties linked to invisible dark matter that makes up 85 percent of all matter in the universe and to dark energy that's accelerating the expansion of the universe. There is considerable optimism that this will become the approach of choice for many more data processing and analysis problems in astrophysics and other fields." Explore further: Standard model of the universe withstands most precise test by Dark Energy Survey (Update) More information: Yashar D. Hezaveh et al. Fast automated analysis of strong gravitational lenses with convolutional neural networks, Nature (2017). Prepared for Data Floods of the Future "The neural networks we tested – three publicly available neural nets and one that we developed ourselves – were able to determine the properties of each lens, including how its mass was distributed and how much it magnified the image of the background galaxy," said the study's lead author Yashar Hezaveh, a NASA Hubble postdoctoral fellow at KIPAC.

Read More

How the ‘Facebook of music’ is using big data to find the next pop star

How the ‘Facebook of music’ is using big data to find the next pop star

Hundreds of millions of minutes of music are streamed on our site, and we send hundreds of millions of emails on behalf of our artists."    With its big-data ear "listening" to every artist on its site, and with a tiered system of music experts distinguishing which new acts have talent, ReverbNation's robust curation system can now single out rising independent acts and present them with career-making opportunities. Once ReverbNation has identified a new crop of promising artists, the company then acts as a bridge for those artists to reach record label contracts, lineup placements at major music festivals like Bonnaroo and Summerfest, and break-out appearances on TV and media outlets.  Shifting the music industry model Alabama Shakes, another Grammy-winning act that started out on ReverbNation. Getty Images In discussing ReverbNation's place in a shifting music industry, Simon Perry described how, in his view, the industry's rapid growth in recent years belies an important, underlying problem. "My fear is that this two years of growth that we’ve had in the music industry represents something of a false dawn on the back of streaming companies like Spotify," he said. "It's been built on the back of a business that's losing money and doesn't have a long-term business model." Perry contends that for the industry to stay afloat, record labels will need to "cut their costs of goods sold" by getting to artists earlier and signing them to cheaper initial contracts — rather than continuing the long-standing tradition of participating in high-cost bidding wars over artists that may or may not make it. And that's where ReverbNation, with its curation system for new artist identification, has become a vital tool and A&R-like scout for labels and publishers.  When we spoke, Perry was in the midst of conducting an artist search for Sony/ATV Music, the world's largest music publisher.  Taking the heavy lifting out of the publisher's work in scouting new artists, Perry and his team presented Sony/ATV with a select group of up-and-coming artists from across the globe, all of whom filtered their way up through ReverbNation's curation system. "A lot of the artists were from the middle of nowhere, like 20 miles outside of Reykjavík, Iceland, for instance," Perry said. "And they were like, 'Wow, these artists would never come to our attention.' Because by the time they do, it's too late, and they're competing with UMPG [Universal Music Publishing Group] and Warner/Chappell. "These are artists that are great, and no one's picked up them on yet," he continued. "But because we have a 360-degree purview, we have a crow's nest view of emerging artists, and we're listening to all of them through our site, we are able to identify these acts earlier than anyone else." No longer simply a platform for uploading and discovering new music, the site has employed big-data analysis and large-scale human curation of its artists to become the go-to "farm system" for record labels and music festivals seeking out new talent.  By performing what Perry calls the "huge undertaking" of analyzing the dense volumes of music and personal information that artists upload to their site, the company has been able to identify which acts stand out as potential successes early on in their careers.  "Artists tell us everything about themselves.

Read More

Google and Microsoft Can Use AI to Extract Many More Ad Dollars from Our Clicks

Google and Microsoft Can Use AI to Extract Many More Ad Dollars from Our Clicks

Microsoft tells WIRED that it constantly tests new machine learning technologies in its advertising system. “Online advertising is perhaps by far the most lucrative application of AI [and] machine learning in the industry,” says John Cosley, director of marketing for Microsoft search advertising. Bing recently started using new deep learning algorithms to better understand the meaning of search queries and find relevant ads, he says.Related StoriesResearch papers on using deep learning for ads may undersell both its true power and the challenge of tapping into it. Companies carefully scrub publications to avoid disclosing corporate secrets. The company has released anonymized logs of millions of ad clicks that Google and others have used in papers on improving click predictions.Perhaps not surprisingly, Rajan believes deep learning still has much more to offer the ad industry. For example, it could figure out long-term cause and effect relationships between what you see or do online today and what you click on or buy next week. “Being able to model the timeline of user interest is something that the deep models are able to do a lot better,” she says.That Google and Microsoft are getting better at predicting our desires and clicks can be seen as a good thing. Benjamin Edelman, a professor at Harvard Business School, has published research suggesting Google search is biased toward the company’s own services and designed to unfairly force corporations into spending heavily on ads for their own trademarks. (Google has been fined $2.7 billion for the former and successfully defended multiple lawsuits alleging the latter.)Such market-warping practices could be boosted by machine learning too. “If machine learning can improve the efficiency of their advertising platform by showing the right ad to the right guy, then more power to them—they are creating value,” Edelman says. “But a lot of the things that Google has done haven’t enlarged the market.” In advertising, as in many other areas, AI can give tech companies great power—and responsibility. A recent research paper from Microsoft’s Bing search unit notes that “even a 0.1 percent accuracy improvement in our production would yield hundreds of millions of dollars in additional earnings.” It goes on to claim an improvement of 0.9 percent on one accuracy measure over a baseline system.Google, Microsoft, and other internet giants understandably do not share much detail on their ad businesses’ operations. They all describe significant gains in predicting ad clicks using deep learning, the machine learning technique that sparked the current splurge of hope and investment in AI.Google CEO Sundar Pichai has taken to describing his company as “AI first.” Its balance sheet is definitively ads first. Google reported $22.7 billion in ad revenue for its most recent quarter, comprising 87 percent of parent company Alphabet’s revenue.Earlier this month, researchers from Google’s New York office released a paper on a new deep learning system to predict ad clicks that might help expand those ad dollars further. The authors note that a company with a large user base can greatly increase revenues with “a small improvement,” then show their new method beats other systems “by a large amount.” It did so while also requiring much less computing power to operate.Alibaba, the Chinese ecommerce company and one of the world’s largest retailers, also has people thinking about boosting its billions in annual ad revenue with deep learning. It was tested on anonymized logs from some of the hundreds of millions of people who use its site each day.Alibaba’s researchers tout the power of deep learning to outperform conventional recommendation algorithms, which can sometimes stumble on the sheer diversity of users’ online lives.

Read More

Google ARCore gives Android users augmented reality without Tango – The Verge

Last week in San Francisco, Google showed me an app called Oz. Oz is a kind of augmented reality picture book: it places animated characters from The Wizard of Oz into the physical world, as viewed through a smartphone camera. I?d tried it a few months earlier at Google I/O, running on the Tango AR platform, and the content hadn?t changed. Bavor says. ?That really let us learn a lot, figure out what the use cases are, and push forward the technology ? out ahead of what would have been possible with standard smartphone hardware.? Now that Google considers ARCore good enough for a wide release, Tango-branded devices ? like the Asus ZenFone AR that came out just a few weeks ago ? seem to be a thing of the past. ?I think Tango fades into the background as more an enabling technology that kind of works behind the scenes,? says Bavor. The first is motion tracking, which estimates a phone?s relative location based on internal sensors and video footage ? so you can pin objects in one place and walk around them. But the experience was far more interesting ? because for the first time, it could run on a phone that I use every day. In a simple demo app, you can set a little Android mascot down in a virtual forest, where it?ll wave when you hold your phone to its face. These are the same kind of capabilities you?ll find in Apple?s ARKit, and I haven?t spent enough time with either platform to rigorously compare their quality. But my controlled demo at Google?s offices was one of the best experiences I?ve had with phone-based AR. Objects didn?t jitter when I walked around them, the way I?ve seen even some official ARKit demos do. Props were surprisingly good at popping back into place when I turned the camera away or covered it up, although they couldn?t recover when I lowered the phone and strolled around the conference room. But Bavor promises that for basic AR tracking, Google has optimized ARCore?s performance more than an outside developer could do. ?The level of quality, the capability, the things it can do, I think will be several levels above the other solutions out there.? Experienced developers can use Java/OpenGL, Unity, and Unreal, and people who are new to 3D design can export ARCore objects from Google?s Tilt Brush VR painting app, or the VR modeling tool Blocks, which Google launched last month . Google is also releasing two experimental, AR-focused builds of Chromium: an Android-based web browser using ARCore, and an iOS-based one based on Apple?s ARKit. I used the Android browser to test a limited version of shopping site Wayfair?s furniture preview tool, which exists as a dedicated app for Tango . It wasn?t quick to load, but once it did, it worked about as smoothly as the app-based equivalent. One app called Constructor, for example, relies on Tango?s dedicated infrared depth-sensing camera to create detailed 3D meshes. ?The environment understanding, as good as it is, is really kind of detecting surfaces to place things on, as opposed to the full 3D structures,? says Bavor. Things like Wayfair?s furniture previews might be less accurate as a result, although AR director of product Nikhil Chandhok says that ?for all the apps that we think that users want,? the difference is negligible. After three years, Tango developers have found some fun and interesting things to do with AR. Bavor says that people ?light up? when they place impossible objects into the world, and interior design apps seem like a natural fit. On ARCore, someone even hacked together an app for Google?s complicated espresso machine ? if you hold up your phone, you?ll see instructions like ?Put your grounds here? or ?Don?t touch this piece, it?s hot? overlaid on the camera image. With VPS, you could conjure an AR prop and come back to it much later, or even leave it for someone else to find. Google?s augmented reality program could also intersect with its push for visual search . One of the ARCore team?s members is Jon Wiley, formerly the lead designer of Google Search. Now the company?s director of immersive design, he thinks combining ARCore with a visual search tool like Google Lens could pull human-computer interaction more toward the ?human? side of the spectrum. If smartphones are going to follow our thought processes and not the other way around, they need to see the world like we do, Wiley says. ?Getting the phone and getting the real world to line up is an incredible technical challenge, but it also offers the opportunity to have a much more intuitive interface.? For an example of how this might work, imagine searching for instructions ? say, a guide to that complicated espresso machine ? by showing Google a picture of the object. Visual search could identify it automatically, and augmented reality could offer an overlay of instructions, instead of a link to a YouTube video or written manual. ?We’re working very closely with the Google Lens team, and I see ARCore as one of the many ingredients that will go into experiences like Lens,? says Bavor. ?Not anything to announce on that right now, but let’s just say we think ARCore is going to make all that stuff more interesting, more powerful, and more useful for people.? But the company promised a similar laundry list of partnerships for the Daydream VR platform, and several of those phone makers still haven?t delivered. It?s being released right away on the popular Galaxy S8, and you don?t need special accessories to use it. ARCore can also benefit from the work iOS developers have done with ARKit, if it?s easy enough to port their apps to Android. It?s growing the entire AR space enough that people are constantly using it with services like Maps and Lens, regardless of operating system. ?We’re here to build great products that a lot of people use, and that likely means for those applications, being where the users are. Augmented reality offers an entirely new way of looking at the world ? and no matter what kind of window you?re looking through, Google is designing the tools to interpret what you see. It?s launching on the year-old Google Pixel and Samsung Galaxy S8 phones, supported by Android 7.0 Nougat as well as its recently released successor Android Oreo. Google is under obvious pressure to compete with Apple?s lightweight version of AR, which has produced a small wave of clever experiments since its announcement in June. But the company?s head of augmented and virtual reality, Clay Bavor, describes ARCore as an intentional long-term outgrowth of Tango. ?Our approach with Tango was to un-constrain ourselves,?

Read More

Virtual Reality Is A Growing Reality In Health Care

Now, while there isn’t yet a “rush” to develop video games for health (certainly compared to the much larger overall video game market), efforts are growing. As�Leigh Christie, who directs the� Isobar �NowLab for North and South�America, explained, “The headsets currently used in VR are still bulky and not the most comfortable, and the visual presentation and interactivity continue to improve.” Some health care professionals can be quite exacting when something doesn’t look, feel, or even smell exactly like the real thing. (Game developers, leaving out the actual smell of vomit in a VR game is probably OK.) While there are obstacles, both real and virtual, to overcome before VR can become more mainstream in health care, the Games for Change Festival did showcase some apps that are starting to make a difference. One example is Isobar’s Common Ground VR. This game aims to simulate, at least for a little bit, what it’s like to have a visual disability like macular degeneration or glaucoma or a disability that restricts your ability to reach (e.g., being in a wheelchair). The�14th installment of this annual event that brings together people who make video games to help society, people, and health also included a one-day�Virtual Reality (VR) for Change Summit on August 2. And this VR Summit showed just how much of a reality VR for�health care is becoming. Another example is Kognito’s simulation game in which you can play roles such as someone talking to a child about substance abuse, a student potentially in psychological distress, a person contemplating suicide, or a patient who isn’t compliant with taking medications, as described in a publication in the journal mHealth . Ron Goldman, co-founder and CEO of Kognito explained, “By providing players with hands-on practice in navigating critical health conversations with virtual, fully animated virtual humans, we are able to build their confidence and skills to lead similar conversation in real life.” Training health professionals seems like the most immediate application of VR. An increasing number of health professionals and educators seem open to the idea of using gaming to supplement and enhance traditional health education that has tended to focus on two extremes: direct patient contact and textbook-and-lecture-based learning. The former is limited by the patients who happen to be available or the latter is limited by the fact that it can be really, really, really dull and virtually unrealistic. As Jenn McNamara, Vice President of Serious Games and Strategic Partnerships for� BreakAway Games , related, “Seeing researchers in the healthcare community not only interested in using games for training and assessment of medical professionals but putting time and resources towards the validation of games for these applications is tremendously exciting. For example, Breakaway, Ltd, has been working with partners to develop simulation platforms�such as Pediatric Sim ,�a game that teaches and assesses performances on 7 different pediatric emergency scenarios (anaphylaxis, bronchiolitis, diabetic ketoacidosis, respiratory failure, seizure, septic shock and supraventricular tachycardia) and the USC Standard Patient Studio�that allows you to talk to different types of patients that visit you in a virtual doctor’s office. One could easily see how such games could morph more and more into VR games that get closer and closer to “real” interactions. The Games for Change Festival started with 40 or so people in a conference room who realized that video games offer a way to connect with audiences and reach audiences that are tough to reach. The festival concluded with a cocktail reception at�VR World NYC that included VR games, sushi, and beer…which for some people is the definition of Nirvana. Previously, many gamers have had to argue the indirect benefits of playing such as improving�hand eye coordination ( as described in this study published in Psychological Science ) and problem solving skills while stealing cars (Grand Theft Auto), gathering abnormally large mushrooms (Super Mario Brothers), avoiding a massive�gorilla who seems to have an endless supply of barrels (Donkey Kong), or saving the Universe (Halo).

Read More
1 2 3 109