代码:
cd /d d:\MachineLinerningProject
call D:\MachineLinerningProject\venv36\Scripts\activate
D:\MachineLinerningProject\venv36\Scripts\jupyter notebook
代码:
cd /d d:\MachineLinerningProject
call D:\MachineLinerningProject\venv36\Scripts\activate
D:\MachineLinerningProject\venv36\Scripts\jupyter notebook
基于某些原因可能在开发的时候通过django的manage.py运行定时任务没有任何的问题,但是一旦到了线上环境通过nginx+uwsgi来运行就会发现定时任务不断的重复执行,并且基本都执行失败了。发生这个问题的原因在于uwsgi启动了多个进程来提供服务,于是每次启动的时候定时任务都会跟着再启动一次,于是有4个进程的话,对应的服务就会启动4次,除了第一次可能执行成功后面的基本都会挂掉。
要解决这个问题其实也不难,只要保证在第一次启动的时候添加定时任务并且执行,以后启动的进程不再处理定时任务即可。但是在这种条件下通过python的进程互斥其实貌似并不是非常好使,具体可以看这个:
uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the
--enable-threads
switch. See the uWSGI documentation for more details.Also, assuming that you will run more than one worker process (as you typically would in production), you should also read the next section.
https://apscheduler.readthedocs.io/en/latest/faq.html#how-can-i-use-apscheduler-with-uwsgi
这个工具的用途就是批量获取海盗湾的磁力链接,例如:https://thepiratebay.cr/search/tokyo%20hot
如果要获取链接使用迅雷下载可以使用这个工具,查看网页源代码,贴入上面的文本框,点击提取链接就会获取全部磁力连接了。
网上的关于uwsgi的自启动的方法还是挺多的,具体搜索一下就知道了,这里简单的写一下官方推荐的方法,通过systemd启动服务。如果用这个方法需要首先确定systemd的版本大于211。
通过下面的命令获取systemd版本信息:
root@mars:/etc/systemd/system# systemctl --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN
按照官方的文档编写service文件,放入/etc/systemd/system目录下然后执行
systemctl start emperor.uwsgi.service即可启动服务。service文件如下:
[Unit]
Description=uWSGI Emperor
After=syslog.target
[Service]
ExecStart=/usr/local/bin/uwsgi --ini /var/www/html/project/uwsgi.ini
# Requires systemd version 211 or newer
RuntimeDirectory=uwsgi
Restart=always
KillSignal=SIGQUIT
Type=notify
StandardError=syslog
NotifyAccess=all
[Install]
WantedBy=multi-user.target
这个脚本需要注意两个地方,一个是uwsgi的可执行文件路径,另外一个是uwsgi.ini配置文件路径。可执行文件路径可以通过whois uwsgi获取。
网上的做法基本都是下面的代码
return HttpResponseForbidden()
试了一下,效果一般,没有异常页面显示,最终显示的是浏览器的异常页面,如下图:
如果要想让服务器截获异常并且显示错误页可以用下面的方式:
id = request.GET.get('id', '')
timestamp = request.GET.get('timestamp', '')
accesskey = request.GET.get('accesskey', '')
if timestamp == '' or accesskey == '' or id == '':
raise PermissionDenied
具体实现原来可以参考这个链接: https://www.zhihu.com/question/35044484
下面给个Django下的实现代码:
@csrf_exempt
def image_proxy(request):
img = request.GET.get('img')
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36',
}
status = 0
try:
r = requests.get(
img,
headers=headers)
except ConnectionError, ConnectTimeout:
status = 1
if status == 1:
return ''
response = HttpResponse(r.content, content_type='image/jpeg')
return response
url.py
url(r'^spider-api/image-proxy/$', image_proxy),
访问方法,url:
http://127.0.0.1:8001/spider-api/image-proxy/?img=https://mmbiz.qpic.cn/mmbiz_png/WliaoSKPrpSPqGrhMmQK8MwKR6AZ7qDDy2JtSxRjk3ZUke41PUGP6RoaibzIgxw8ey5cejb5FzkplhgGd48oOxAg/640
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import re
import requests
from bs4 import BeautifulSoup
html = requests.get('https://mp.weixin.qq.com/s?src=11×tamp=1533887718&ver=1051&signature=Xszdx5nmmHyebcH0MXxyHi7-jDwGoNDUDXCHJzPVic68tXGRSTiM3CStUDfSR*aALaC3nK3Ez4e33uLR5ir1pLgy3vEvWXWOvVXgAbsXMn5fB-HWboOW26GH*KMRVhgX&new=1')
soup = BeautifulSoup(html.text, "html5lib")
data = soup.findAll(text=True)
def visible(element):
if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
return False
elif re.match('', str(element.encode('utf-8'))):
return False
return True
result = filter(visible, data)
with open('res.txt', "w+") as p:
for i in result:
print(str(i))
p.write(str(i))
print list(result)