Multinomial text classification In Data Science: A machine learning approach

Hi all! It’s been a long time since I have posted. The time between my last post in March to today was filled with a high amount of enthusiasm and lots of learning. It was great to hear feedback from a few of my readers and contributing their ideas and suggestions to make my previous contributions better and optimized as per their use cases.

This time, I have come up with an implementation of Text classification which is more often used in most machine learning and data science applications like binary classification, NLP (Natural Language Processing) applications or Sentiment analysis. To begin with, its great to start with an idea of what exactly is text classification.

Text classification is the process of labeling natural language texts with relevant classes or categories from a predefined set

What are the applications of Text classification?

To name a few:

  • Automating CRM tasks by analyzing and assigning the tasks based on relevance
  • Analysing the sentiments from the text to determine the mood of an audience
  • Predicting the possible class or category of a given dataset and many more…

It sounds interesting!

What are the methods to do it?

Keeping it short, there are namely two popular methods for doing text classification:

  1. Bernoulli classification
    • This model uses binary occurrence information, ignoring the number of occurrences i.e the count of the elements while training our model.
    • For example, it is suitable to use when we have only two categories or classes for data classification, like a coin when tossed may produce heads or tails.
    • When the number of classes is more than two and the document is large enough, then this model may make mistakes when classifying long documents.
  2. Multinomial classification
    • This model does not ignore the number of occurrences of the elements in a text while training our model.
    • It provides the most generic form of text classification and supports classification in multiple categories of classes.
    • For example, it can be used to classify a document containing text into multiple classes, like rolling a dice having 6 sides (classes).

How did I do it?

Since multinomial classification is the most reliable method to classify large documents, I have used this method for my implementation of text classification. My implementation uses a training dataset to train my model and returns the accuracy (%) of my trained model when some test data is fed as an input. The higher the accuracy, the more reliable the model.

The technology stack which I have used to train my model and implement this application uses:

  1. Java 8
  2. MongoDB

Follow this link to find my implementation: https://goo.gl/V5kgkU

Feel free to comment your implementations or post your queries.

Que :: Q & A Portal Built Using Ruby on Rails, Marionette JS and SQL

Hi!

Greetings!

Have a look at my another Open Sourced Project “Que”.

As said, improvements come with constant evolutions and the evolutions are great when there are contributions coming from diverse elite minds. Therefore, feel free to modify, fork or clone the project and contribute your updates and ideas.

It is a simple Questions and Answer Portal built using Ruby on Rails, Backbone JS, Marionette JS and SQL Database.

Source Code: https://github.com/karantongay/Que

The website is live on https://cryptic-stream-87241.herokuapp.com/

It features:

1. Asking Questions

2. Answering the Questions

3. Following and Unfollowing the Users

4. User Profile Section

5. Deleting questions and answers with validation i.e. only the owner of the question or answer can perform the delete operation

This project is open for improvements and contributions.

Splinter :: Java App to Split Large Excel Files

Greetings!!

Recently, I have open sourced a small application on Github, built using Java that performs splitting of a large excel file into multiple smaller files with the help of Apache POI library. It also supports files with .xlsx extensions.

This can be used as a module in the existing applications or can be used as a standalone application by implementing some User Interface for the application.

The project is completely open for contributions and improvements therefore feel free to fork or clone or download the project and if there are any contributions please feel free to generate the pull request on Github.

Please follow this link to the repository:

https://github.com/karantongay/Split-Large-Excel-File-into-smaller-excel-files-using-Apache-POI

Motivation for this project:

In most of the applications, especially those that deal with the excel sheets often face different issues if the excel files are large enough in size.

Such files may not be opened on a computer with limited resources and therefore cannot be used for further processing if there is a need to open the excel file and verify the contents.

This application helps to handle such large excel files and splits them into separate excel files of defined number of rows which can be opened easily.

Please feel free to share what all things you may feel you could contribute to make it better.

Happy Coding!

Introduction to Kotlin – A Quick Understanding!

Greetings!

Getting back after a long time, and this time coming up with a recent headline “Kotlin”, an upcoming programming language that offers state of the art features. Let’s know about it in brief.

Imagine, while writing a Java code, forgot to end a line with a semicolon (;) and yet the program executes wonderfully giving the expected results or forgetting to include package header or there might be some possible runtime exceptions and yet no complaints. Such concise and safe is Kotlin.

Let’s understand quickly what Kotlin exactly is and why it is a good language.

What is Kotlin?

The trend of continuously improving the programming experiences has introduced another programming language “Kotlin” which is a new language from JetBrains – known for making the world’s best IDEs. Kotlin is a statically typed programming language for modern multiplatform applications. In the recent Google I/O 2017 Conference, the Android team announced a First Class Support for Kotlin. For Android developers, Kotlin can prove a great resource that helps solve common problems such as Runtime Exceptions etc.

Besides Android, Kotlin also supports application development for JVM, Android, Browser and native application development too.

Why opt for Kotlin?

The core feature of Kotlin is that it compiles to JVM bytecode. It can use all the currently available Java Frameworks and Libraries. Additionally, it integrates easily with Gradle, Maven or other dependency build systems. Moreover, for a programmer, the language can be learned in a few hours by simply reading the documentation of the language.

Secondly, since it has been provided a First Class support from Google, developers might feel confident to use Kotlin for development.

Also, Kotlin can be introduced into existing projects meaning that existing technology investments and the skills of developers are preserved.

Things are always best understood when we experience it ourselves, so go ahead and try Kotlin online here: https://try.kotlinlang.org

The best feature this online development environment provides is to convert the existing Java code into Kotlin. Try this as well and experience how multiple lines of code in Java is reduced to commendably fewer lines of code.

Testing a Web Application Using Selenium IDE – A Simple Approach!

Greetings!

This tutorial is a guide to creating a Google App Engine Web Application and testing it using Selenium IDE. Once the basics of using the Selenium commands are understood, many other test cases can be designed in a similar way. The focus is also given on how can one easily create a Google App Engine Application.

Please download the supporting files towards the end of this post.

Additionally, Let’s understand how software testing can be really important.

Every day, we use many applications that have become part of our lives may it be to track the calorie intake, the number of steps we walk, send emails etc. It is really very important that such applications should always perform accurately and as expected. Otherwise, the credibility of such application can be compromised.

The reason for which our routine applications could deliver us seamlessly without failing to perform as per our necessities is because of the level of quality they are able to maintain by effective testing of the application before releasing it in the market.

For example, Imagine that one is using a mobile app to control an electrical appliance and the mobile app is not tested well before it is handed over to the user and it turns the appliance “OFF” when the user clicks on “ON” and vice versa. This can be miserable if someone is controlling the appliance from a remote location and relying on the application, without knowing what exactly is happening. Fixing this issue after the application is in the market can be expensive in terms of all aspects.

Therefore, it is very important to verify and validate the system against its responsibilities and performance to help reduce future risks. In brief, this process is nothing but “Software Testing”.

Testing verifies that the system meets the different requirements such as functional, performance, security and so on. This verification is done to ensure that we are building the system right.

Additionally, the software validation ensures that the system that is being developed is actually what the user needs.

This eventually helps improve the quality of the product and can reduce the post-release cost of the service and support and there shall be opportunities to increase the revenue.

Supporting files mentioned in the Tutorial:
1. Booths Algorithm Java File


2. HTML File

3. Selenium IDE Add on for Mozilla Firefox (Click Here)