TRY AND ERROR

気になったこと、勉強したこと、その他雑記など色々メモしていきます。。Sometimes these posts will be written in English.,

今更ながらphpでword(docx)を読み込んでplain textを得る

word(docx)ファイルを出力する、とかのサンプルやライブラリは結構あるんですが、
今回はそもそもただ単に中身のプレーンテキストが欲しいという場合の話。

docx(確かword2007以降)は実はzipアーカイブなので、unzipとかで解凍できる。
その中にword/document.xmlというのがあって、それにテキストの情報が入っている。
これをパースしてパラグラフ -> テキストで回せば改行含めプレーンテキストとして取得できます。

今回はPHPのZipArchiveでパースしますが、別にphpでなくても全然大丈夫です。
PHP: ZipArchive - Manual

$NL = "\n";
$contents = "";
$file_path = "your.docx";
$zip = new \ZipArchive();

if ($zip->open($file_path) === true) {
	$xml = $zip->getFromName("word/document.xml");
	if ($xml) {
		$dom = new \DOMDocument();
		$dom->loadXML($xml);
		$paragraphs = $dom->getElementsByTagName("p");
		foreach ($paragraphs as $p) {
			$texts = $p->getElementsByTagName("t");
			foreach ($texts as $t) {
				$contents .= $t->nodeValue;
			}
                        // Add New Line after a paragraph.
			$contents .= $NL;
		}
	}
}
return $contents;
English sub


Although There are many samples and libraries on the web, this post is about that simple way you can get the plain text from docx file.
docx file is essentially zip archive that consists some rels, So we could unzip it.
there is a xml file named document.xml in the rels which has information of text.
We could fetch plain text by parsing xml and iterating its specific node which are like paragraph or text,,

CodeDeployでALB-EC2へのデプロイ時間を短縮する

AWS CodeDeployでALB配下のEC2にインプレースデプロイする際、本番にデプロイした際にいきなり遅くなった、、、


結論、本番のALBターゲットのヘルスチェック間隔がdevのそれよりも長かったのが原因でした。


以下のフォーラムによると、デプロイ時にトラフィックを止める際、ヘルスチェックの設定に依存するとのことなので、
デプロイする時だけヘルスチェックの間隔を短くするとかなり速くなります。

https://forums.aws.amazon.com/thread.jspa?threadID=254752


ちなみに、今回は間隔を30秒 -> 5秒(最小の閾値)に、さらに間隔に付随するタイムアウトなどの設定も合わせて変更したところ、
デプロイ時間が半分以下になりましたw

My Impressions About Using Vim In Earnest.

At first, I've been light vim user who use it only in environment of like CLI.
But I decided to use Vim for every kind of my development, taking memo or something like that.
The reason why I did above is Vim's mobility, reasonableness and extensibility...some reasons are there but the most important for me is just "Vim is Cool".

Before using vim for development in earnest, I thought that it's realy going to confuse me and hard to use it, But actually it wasn't no matter enough.
I could smoothly migrated usual editor from "atom" to "vim".
And now it's pretty comfortable for me.

I use 3rd party plugins to improve vim experiment and I really respect people who develop them.
These're my favorites like below.

'Shougo/unite.vim'
'Shougo/neomru.vim'
"tyru/caw.vim.git"
'scrooloose/nerdtree'
'jistr/vim-nerdtree-tabs'
'mattn/emmet-vim'
'grep.vim'
'Shougo/neocomplcache'
'Shougo/neosnippet'
'Shougo/neosnippet-snippets'
'tomtom/tcomment_vim'
'surround.vim'
'Townk/vim-autoclose'
'rhysd/accelerated-jk'
'thinca/vim-quickrun'
'ctrlpvim/ctrlp.vim'


In the middle of fixing though, This is my .vimrc.
GitHub - kentaro-a/evil-vim

My convenient snippet for showing information of EC2 instance.

If you want to make sure ec2 instance information from inner itself, you should check this post.
(Describes about in the case of EC2-AmazonLinux which has already installed `aws` command.)

For instance this is an alias named `me` that output instance's specified tag, type, private ip and public ip to the console.

/etc/profile.d/me.sh

alias me="me"
function me() {
	aws ec2 describe-instances \
		--region [region] \
		--instance-ids `/usr/bin/curl -s http://169.254.169.254/latest/meta-data/instance-id` \
		--query 'Reservations[].Instances[].Tags[?Key==`[tag key]`].Value' \
		--output text;

	curl http://169.254.169.254/latest/meta-data/instance-type;
	echo;

	curl http://169.254.169.254/latest/meta-data/local-ipv4;
	echo;

	curl http://169.254.169.254/latest/meta-data/public-ipv4;
	echo
}


Type `me` on the console.

$ me

ELBのヘルスチェック(200)を受け入れつつBasic認証をかける

ELB (ALB) + EC2*2(apache)にvirtual hostでマッピングしたサービスにBasic認証をかけたい。
ただしELBはstatus=200のチェックのままで、この辺はいじりたくないなぁって時の話です。

上記のようなAWS上の設定やアプリの設定を極力変えずにBasic認証をかけるには、
アプリのディレクトリにある.htaccessで対応するのではなく、virtualhostのconfや別途追加の設定ファイル等を用意し、
そちらでELBヘルスチェックのUAを貫通させるのがよさそう。


Wanna enable to Basic Authentication without any modifications to AWS and App configuration.
This time is for ELB (ALB) + EC2*2(apache), and ELB healthchek remains 200 or as it is.
For about it, you'd better to prepare an apache virtualhost configuration file or something like that, and allow UA of ELB healthcheck to access without Basic Authentication inside that.

For example like this.



例)
/etc/httpd/conf.d/vhost.deny_access.conf

<Directory /path/to/app>
	Require valid-user
	AuthType Basic
	AuthName "Please enter your ID and password"
	AuthUserFile /path/to/.htpasswd

        # Satisfy AnyでOR条件にしてELB-HealthCheckerで始まるUAのみ許可する
	Satisfy Any
	Order Deny,Allow

	SetEnvIf User-Agent "^ELB-HealthChecker.*$" noAuth
	Allow from env=noAuth
	Deny from all
</Directory>

【AWS CodeDeploy】Resource permissions by appspec.yml

I had misunderstanding for "permissions" in appspec.yml of AWS CodeDeploy.
Say around that we have an app which directory structure is like below.
CodeDeployのappspec.ymlで使うpermissionセクションについて誤解してた。。
例えば以下のような構成のアプリがあるとする。

app
|-src
  |--something...
|-logs
  |--logfiles...

I wanted to set logs directory's permission to 757, so that I did like this.
logsディレクトリのパーミッションを757にしたくて、こんな感じのappspec.ymlを書いた。


appspec.yml

version: 0.0
os: linux
files:
  - source: /
    destination: /var/www/html/app
permission: 
  - object: logs
    mode: 757 
	type: 
	  - directory

But it didn't work...
AWS official document says that permission section affects to the resources contained in the object you specified.
しかし動かず、、、
AWSの公式ドキュメントをみると、permissionセクションは指定したobjectに含まれるリソースに影響するみたい。

type – Optional. The types of objects to which to apply the specified permissions. This can be set to file or directory. If file is specified, the permissions are applied only to files that are immediately contained within object after the copy operation (and not to object itself). If directory is specified, the permissions are recursively applied to all directories/folders that are anywhere within object after the copy operation (but not to object itself).

AppSpec 'permissions' Section (EC2/On-Premises Deployments Only) - AWS CodeDeploy


So I made script to execute "chmod" to logs directory, and called it in the "AfterInstall" hook, then of course it worked well.
Also, I think maybe it's ok to use pattern matching which specifies logs directory from higher hierarchy directory(like /).
今回はchmodで権限を変えるスクリプトを書いて、AfterInstallフックで呼び出したところ、上手く動いた。
別のやり方として、おそらくlogsの上の階層のディレクトリをobjectに指定し、パターンマッチでやればできそう。

AWS CodeDeploy logs that you should watch.

We have a web service with AWS, located in like below environment.
(This time in case of using AmazonLinux)

WebServer:
    - Production:
        - ALB * 1
            - EC2 * 2
    - Dev:
        - EC2 * 1
DB:
    - Production: Aurora * 1
    - Dev: Aurora * 1


Our source code is managed by github organization account.
And we're going to deploy source code from github into Production EC2 * 2 with AWS CodeDeploy.
A little before we had many trials to succeed in_place deployment then we caught some Errors in CodeDeploy web console.
If you face any Errors to CodeDeploy in the similar environment to us, at first you'd better to watch these logs in EC2 instance, and they probably help you to solve the problem.

・/opt/codedeploy-agent/deployment-root
・/var/log/aws/codedeploy-agent
・/var/log/cloud*
・/etc/codedeploy-agent/conf
・/tmp/codedeploy-agent.update.log

In Addition to above, you should confirm these stuff.

  • IAM roles attached to EC2 and CodeDeploy are correct.
  • EC2 instances status is just helthy.
  • AWS CodeDeploy Agent is installed in EC2 instances.
  • Github Organization account permits access from AWS CodeDeploy.

### Reference.
docs.aws.amazon.com