Multinomial text classification In Data Science: A machine learning approach

Hi all! It’s been a long time since I have posted. The time between my last post in March to today was filled with a high amount of enthusiasm and lots of learning. It was great to hear feedback from a few of my readers and contributing their ideas and suggestions to make my previous contributions better and optimized as per their use cases.

This time, I have come up with an implementation of Text classification which is more often used in most machine learning and data science applications like binary classification, NLP (Natural Language Processing) applications or Sentiment analysis. To begin with, its great to start with an idea of what exactly is text classification.

Text classification is the process of labeling natural language texts with relevant classes or categories from a predefined set

What are the applications of Text classification?

To name a few:

  • Automating CRM tasks by analyzing and assigning the tasks based on relevance
  • Analysing the sentiments from the text to determine the mood of an audience
  • Predicting the possible class or category of a given dataset and many more…

It sounds interesting!

What are the methods to do it?

Keeping it short, there are namely two popular methods for doing text classification:

  1. Bernoulli classification
    • This model uses binary occurrence information, ignoring the number of occurrences i.e the count of the elements while training our model.
    • For example, it is suitable to use when we have only two categories or classes for data classification, like a coin when tossed may produce heads or tails.
    • When the number of classes is more than two and the document is large enough, then this model may make mistakes when classifying long documents.
  2. Multinomial classification
    • This model does not ignore the number of occurrences of the elements in a text while training our model.
    • It provides the most generic form of text classification and supports classification in multiple categories of classes.
    • For example, it can be used to classify a document containing text into multiple classes, like rolling a dice having 6 sides (classes).

How did I do it?

Since multinomial classification is the most reliable method to classify large documents, I have used this method for my implementation of text classification. My implementation uses a training dataset to train my model and returns the accuracy (%) of my trained model when some test data is fed as an input. The higher the accuracy, the more reliable the model.

The technology stack which I have used to train my model and implement this application uses:

  1. Java 8
  2. MongoDB

Follow this link to find my implementation: https://goo.gl/V5kgkU

Feel free to comment your implementations or post your queries.

Risk :: Can it be an asset of opportunities?

Why do we think a particular event can be a “Risk”?

Let’s take an example.

The time when we don’t know how to drive, that time we do think, driving without the knowledge of how to drive is a risk and its not at all safe. Not taking this risk is the most correct and wise decisions at this moment. Whereas after we learn how to drive with a trainer’s assistance, then again, initially, driving a car alone can again be matter of risk, but eventually with only a little hesitation we become confident enough to to take the risk of driving alone because we know that we have gained sufficient knowledge and some very basic practice of driving a car and taking this risk will not be much scarier as it was before when we didn’t knew how to drive at all. Taking the risk with a sufficient knowledge of the result of that action, will lead to much favourable outcomes and such actions are often called as “Calculated Risks”.

To infer, taking Risk is nothing but lack of necessary or basic information or knowledge of that event.

“Risk is no gamble.”

Yes it is said right, to push your limits and achieve your dreams, we must definitely take risks. But risks without direction and calculations often lead to non productive decisions. Of course some decision are very time critical and in such situations I personally advise that one should at least get clear about where he wants to lead and have one or more reasons to support how important it is to take that risk.

When one gains the necessary knowledge about why that particular event called “Risk” is necessary to be taken, one also gets the confidence of getting through obstacles that may or may not arise through the event, and eventually, this event which we call “Risk” no longer seems to be scarier but that Risk gets converted into an Opportunity. And as we know there is nothing called failure, rather its learning and just a step closer to the success, therefore the results that arise out of taking the opportunity (or risk) makes the person ready and confident to achieve success in life.

By gathering some strong reasons and necessary knowledge about the event, we can convert these Risks into an asset of opportunities.

Que :: Q & A Portal Built Using Ruby on Rails, Marionette JS and SQL

Hi!

Greetings!

Have a look at my another Open Sourced Project “Que”.

As said, improvements come with constant evolutions and the evolutions are great when there are contributions coming from diverse elite minds. Therefore, feel free to modify, fork or clone the project and contribute your updates and ideas.

It is a simple Questions and Answer Portal built using Ruby on Rails, Backbone JS, Marionette JS and SQL Database.

Source Code: https://github.com/karantongay/Que

The website is live on https://cryptic-stream-87241.herokuapp.com/

It features:

1. Asking Questions

2. Answering the Questions

3. Following and Unfollowing the Users

4. User Profile Section

5. Deleting questions and answers with validation i.e. only the owner of the question or answer can perform the delete operation

This project is open for improvements and contributions.

Splinter :: Java App to Split Large Excel Files

Greetings!!

Recently, I have open sourced a small application on Github, built using Java that performs splitting of a large excel file into multiple smaller files with the help of Apache POI library. It also supports files with .xlsx extensions.

This can be used as a module in the existing applications or can be used as a standalone application by implementing some User Interface for the application.

The project is completely open for contributions and improvements therefore feel free to fork or clone or download the project and if there are any contributions please feel free to generate the pull request on Github.

Please follow this link to the repository:

https://github.com/karantongay/Split-Large-Excel-File-into-smaller-excel-files-using-Apache-POI

Motivation for this project:

In most of the applications, especially those that deal with the excel sheets often face different issues if the excel files are large enough in size.

Such files may not be opened on a computer with limited resources and therefore cannot be used for further processing if there is a need to open the excel file and verify the contents.

This application helps to handle such large excel files and splits them into separate excel files of defined number of rows which can be opened easily.

Please feel free to share what all things you may feel you could contribute to make it better.

Happy Coding!

Introduction to Kotlin – A Quick Understanding!

Greetings!

Getting back after a long time, and this time coming up with a recent headline “Kotlin”, an upcoming programming language that offers state of the art features. Let’s know about it in brief.

Imagine, while writing a Java code, forgot to end a line with a semicolon (;) and yet the program executes wonderfully giving the expected results or forgetting to include package header or there might be some possible runtime exceptions and yet no complaints. Such concise and safe is Kotlin.

Let’s understand quickly what Kotlin exactly is and why it is a good language.

What is Kotlin?

The trend of continuously improving the programming experiences has introduced another programming language “Kotlin” which is a new language from JetBrains – known for making the world’s best IDEs. Kotlin is a statically typed programming language for modern multiplatform applications. In the recent Google I/O 2017 Conference, the Android team announced a First Class Support for Kotlin. For Android developers, Kotlin can prove a great resource that helps solve common problems such as Runtime Exceptions etc.

Besides Android, Kotlin also supports application development for JVM, Android, Browser and native application development too.

Why opt for Kotlin?

The core feature of Kotlin is that it compiles to JVM bytecode. It can use all the currently available Java Frameworks and Libraries. Additionally, it integrates easily with Gradle, Maven or other dependency build systems. Moreover, for a programmer, the language can be learned in a few hours by simply reading the documentation of the language.

Secondly, since it has been provided a First Class support from Google, developers might feel confident to use Kotlin for development.

Also, Kotlin can be introduced into existing projects meaning that existing technology investments and the skills of developers are preserved.

Things are always best understood when we experience it ourselves, so go ahead and try Kotlin online here: https://try.kotlinlang.org

The best feature this online development environment provides is to convert the existing Java code into Kotlin. Try this as well and experience how multiple lines of code in Java is reduced to commendably fewer lines of code.