Applied Machine Learning – things you need to know

Một số lưu ý khi áp dụng Machine Learning để giải quyết các vấn đề cụ thể: Always use train_test_split or similar GridSearchCV (built-in cross validation) HDF need to be shrunk after write/update –> ptrepack –chunkshape=auto –propindexes –complevel=9 –complib=blosc data_in.h5 data_out.h5 Use Keras optimizer instead of tensorflow itself (so that it can be saved later… Continue reading Applied Machine Learning – things you need to know

How to get started with Hadoop – Hadoop căn bản

1 of the most painful jobs of a system engineer is to build a whole system by installing multiple packages, one-by-one. We all worry about incompatibility and dependencies With Hadoop, you can do that with big help from HDP (Hortonworks Data Platform) Great tutorials and documentation can be found here http://hortonworks.com/hdp/downloads/ The order of methods you… Continue reading How to get started with Hadoop – Hadoop căn bản

Big Data references

Some useful websites & courses to learn Big Data: [1] http://www.columbia.edu/~rsb2162/bigdataeducation.html [2] http://bigdatauniversity.com/ [3] https://www.coursera.org/ [4] https://www.udacity.com/ [5] http://apache.org [6] http://hortonworks.com [7] http://cloudera.com   Updating …

Masternotdiscoveredexception elasticsearch

Sometimes, when you want to join a node to  elasticsearch cluster, this problem may occur (the reason may vary, but I think there are some limitations of using multicast here) Solution: Uncomment those lines in elasticsearch.yml We tell this host (node) to use unicast discovery instead of multicast, and then specify the master host manually… Continue reading Masternotdiscoveredexception elasticsearch

About the Chukwa released versions

I’m working with some log collection & aggregation tools from Apache Project, when  it came to Chukwa – I read the introduction, release note of the project and didn’t know what to do because it seemed like Chukwa had been in and out for a while and a bit obsolete. So I decided to email the… Continue reading About the Chukwa released versions

Hadoop 2.2 and Flume 1.4 Protobuf Problem and Solution

I have to say the big THANK to the author of  “Hadoop in Practice” : Alex Holmes Source : http://grepalex.com/2014/02/09/flume-and-hadoop-2.2/ The problem you may encounter while  trying to integrate Hadoop 2.2 and Flume 1.4 is the incompatibility between protobuf versions : 2014-04-15 13:56:23,251 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR – org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:422)] process failed java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;… Continue reading Hadoop 2.2 and Flume 1.4 Protobuf Problem and Solution