2021


Shazam Signature Representation of Audio for Full-text Documents

An text document search engine based on Shazam audio fingerprint algorithm was developed. It is capable of matching a query by using a fragment of the original document’s content. The Google Text to Speech API service, as well as a raw-to-audio method are used to transforms text into audio. The audio fingerprints method of Shazam is used to generate signature of a text from the audio file. The engine achieved a 60% match hit ratio when using queries of around 80% of the original file. The system is scalable; storage and computationally efficient as the search is performed using signature instead of the document itself.

2020


Forecasting of Tropical Cyclone Trajectories with Deep Learning

We study forecasting hurricane (tropical cyclone) trajectories using deep learning techniques. Hurricane trajectories exhibit highly complex and nonlinear behavior. Numerous factors such as landscape, atmoshperic effects cause tropical cyclones to follow devious or unwavering routes. We employ a recurrent neural network (RNN)-based deep learning architecture called TrajGRU to capture these complex temporal patterns and perform forecasts. We analyze the performance in terms of kilometers at various hourly resolutions and compare the results with two baseline methods including naive linear predictor and long short-term memory (LSTM) networks. We demonstrate the performance gains and analyze the predictions.

Headline Generation from Distinctive Summaries

In this paper, we demonstrate a LSTM based encoder-decoder network for the headline generation task. State-of-the-art BERT language model is used extract context vectors. Attention mechanism is used together with encoder-decoder network to capture relevancy information in input sequence and improve overall system performance. Several summaries are generated and according to their TF-IDF score, the best summary is used in headline generation instead of evaluating network on the whole content. Generated headlines are compared according to the BLEU metric.

Estimation of Facial Attractiveness Level

We design and train a deep neural network for facial attractiveness level estimation. In particular, we develop an architecture based on convolution, pooling, activation and fully-connected layers. We analyze the performance of our model over a preprocessed facial image dataset with attractiveness level annotations. In this report, we describe our architecture, training procedure and obtained results on the given dataset. The architecture can reach 0.42 and 0.50 mean absolute error in validation and test sets respectively.

2019


Image Captioning With Attention and Transfer Learning

Aim of the project is to generate meaningful sentences describing a given image, which is referred as Image Captioning in the literature. Due to its real-life applications, various solutions and datasets related to the Image Captioning problem were published and are available online. In this report, we first describe the analyses performed on the dataset. Then, we describe our approach to the problem by describing our model architecture and data flow. At every step, we compare our approach with the current works in the literature and explain the reasoning behind our algorithmic decisions by stating the advantages and disadvantages for different possibilities. Lastly, we provide results that present the performance of our approach.