-
Applied Machine Learning – things you need to know
Một số lưu ý khi áp dụng Machine Learning để giải quyết các vấn đề cụ thể: Always use train_test_split or similar GridSearchCV (built-in cross validation) HDF need to be shrunk after write/update –> ptrepack –chunkshape=auto –propindexes –complevel=9 –complib=blosc data_in.h5 data_out.h5 Use Keras optimizer instead of tensorflow itself (so that it can be saved later…
-
How to get started with Hadoop – Hadoop căn bản
1 of the most painful jobs of a system engineer is to build a whole system by installing multiple packages, one-by-one. We all worry about incompatibility and dependencies With Hadoop, you can do that with big help from HDP (Hortonworks Data Platform) Great tutorials and documentation can be found here http://hortonworks.com/hdp/downloads/ The order of methods you…
-
Big Data references
Some useful websites & courses to learn Big Data: [1] http://www.columbia.edu/~rsb2162/bigdataeducation.html [2] http://bigdatauniversity.com/ [3] https://www.coursera.org/ [4] https://www.udacity.com/ [5] http://apache.org [6] http://hortonworks.com [7] http://cloudera.com Updating …
-
Masternotdiscoveredexception elasticsearch
Sometimes, when you want to join a node to elasticsearch cluster, this problem may occur (the reason may vary, but I think there are some limitations of using multicast here) Solution: Uncomment those lines in elasticsearch.yml We tell this host (node) to use unicast discovery instead of multicast, and then specify the master host manually…
-
About the Chukwa released versions
I’m working with some log collection & aggregation tools from Apache Project, when it came to Chukwa – I read the introduction, release note of the project and didn’t know what to do because it seemed like Chukwa had been in and out for a while and a bit obsolete. So I decided to email the…
-
Hadoop 2.2 and Flume 1.4 Protobuf Problem and Solution
I have to say the big THANK to the author of “Hadoop in Practice” : Alex Holmes Source : http://grepalex.com/2014/02/09/flume-and-hadoop-2.2/ The problem you may encounter while trying to integrate Hadoop 2.2 and Flume 1.4 is the incompatibility between protobuf versions : 2014-04-15 13:56:23,251 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR – org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:422)] process failed java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;…