文章詳情頁

python 利用百度API識(shí)別圖片文字（多線程版）

瀏覽：3日期：2022-07-02 10:45:53

#!/usr/bin/env python3# -*- coding: utf-8 -*-'''Created on Tue Jun 12 09:37:38 2018利用百度api實(shí)現(xiàn)圖片文本識(shí)別@author: XnCSD'''import globfrom os import pathimport osfrom aip import AipOcrfrom PIL import Imagefrom queue import Queueimport threadingimport datetimedef convertimg(picfile, outdir): ’’’調(diào)整圖片大小，對(duì)于過大的圖片進(jìn)行壓縮 picfile: 圖片路徑 outdir：圖片輸出路徑 ’’’ img = Image.open(picfile) width, height = img.size while (width * height > 4000000): # 該數(shù)值壓縮后的圖片大約兩百多k width = width // 2 height = height // 2 new_img = img.resize((width, height), Image.BILINEAR) new_img.save(path.join(outdir, os.path.basename(picfile)))def baiduOCR(ts_queue): '''利用百度api識(shí)別文本，并保存提取的文字 picfile: 圖片文件名 outfile: 輸出文件 ''' while not ts_queue.empty(): picfile = ts_queue.get() filename = path.basename(picfile) outfile = ’D:StudypythonProjectscrapyIpProxyport_zidian.txt’ APP_ID = ’’ # 剛才獲取的 ID，下同 API_KEY = ’’ SECRECT_KEY = ’’ client = AipOcr(APP_ID, API_KEY, SECRECT_KEY) i = open(picfile, ’rb’) img = i.read() print('正在識(shí)別圖片：t' + filename) message = client.basicGeneral(img) # 通用文字識(shí)別，每天 50 000 次免費(fèi) # message = client.basicAccurate(img) # 通用文字高精度識(shí)別，每天 800 次免費(fèi) #print('識(shí)別成功！') i.close() try: filename1 = filename.split(’.’)[0] filename1 = ’’.join(filename1) with open(outfile, ’a+’) as fo:for text in message.get(’words_result’): fo.writelines(’’’ + filename1 + ’’’ + ’:’ + text.get(’words’) + ’,’) fo.writelines(’n’)# fo.writelines('+' * 60 + ’n’)# fo.writelines('識(shí)別圖片：t' + filename + 'n' * 2)# fo.writelines('文本內(nèi)容：n')# # 輸出文本內(nèi)容# for text in message.get(’words_result’):# fo.writelines(text.get(’words’) + ’n’)# fo.writelines(’n’ * 2) os.remove(filename) print('識(shí)別成功！') except: print(’識(shí)別失敗’) print('文本導(dǎo)出成功！') print()def duqu_tupian(dir): ts_queue = Queue(10000) outdir = dir # if path.exists(outfile): # os.remove(outfile) if not path.exists(outdir): os.mkdir(outdir) print('壓縮過大的圖片...') # 首先對(duì)過大的圖片進(jìn)行壓縮，以提高識(shí)別速度，將壓縮的圖片保存與臨時(shí)文件夾中 try: for picfile in glob.glob(r'D:StudypythonProjectscrapyIpProxy端口*'): convertimg(picfile, outdir) print('圖片識(shí)別...') for picfile in glob.glob('tmp/*'): ts_queue.put(picfile) #baiduOCR(picfile, outfile) #os.remove(picfile) print(’圖片文本提取結(jié)束！文本輸出結(jié)果位于文件中。’ ) #os.removedirs(outdir) return ts_queue except: print(’失敗’)if __name__ == '__main__': start = datetime.datetime.now().replace(microsecond=0) t = ’tmp’ s = duqu_tupian(t) threads = [] for i in range(100): t = threading.Thread(target=baiduOCR, name=’th-’ + str(i), kwargs={’ts_queue’: s}) threads.append(t) for t in threads: t.start() for t in threads: t.join() end = datetime.datetime.now().replace(microsecond=0) print(’刪除耗時(shí)：’ + str(end - start))

速度快，準(zhǔn)確率99百分，100里必回出錯(cuò)一張。

實(shí)測(cè)，識(shí)別1500張圖片，還是小圖片驗(yàn)證碼大小，高清，用時(shí)30秒，不能識(shí)別150張，出錯(cuò)14張左右。但總體快，不會(huì)出現(xiàn)亂碼啥的。

以上就是python 利用百度API識(shí)別圖片文字（多線程版）的詳細(xì)內(nèi)容，更多關(guān)于python 識(shí)別圖片文字的資料請(qǐng)關(guān)注好吧啦網(wǎng)其它相關(guān)文章！

百度 Python

上一條：python利用pytesseract 實(shí)現(xiàn)本地識(shí)別圖片文字下一條：Python 數(shù)據(jù)分析之逐塊讀取文本的實(shí)現(xiàn)

相關(guān)文章：

1. JVM之class文件結(jié)構(gòu)2. js實(shí)現(xiàn)跳一跳小游戲3. js實(shí)現(xiàn)貪吃蛇小游戲（加墻）4. Python中Anaconda3 安裝gdal庫的方法5. XMLDOM對(duì)象方法：對(duì)象屬性6. 三個(gè)不常見的 HTML5 實(shí)用新特性簡介7. 詳解IE6中的position：fixed問題與隨滾動(dòng)條滾動(dòng)的效果8. Ajax報(bào)錯(cuò)400的參考解決辦法9. asp.net core 認(rèn)證和授權(quán)實(shí)例詳解10. Html5播放器實(shí)現(xiàn)倍速播放的方法示例

排行榜

					
					js實(shí)現(xiàn)跳一跳小游戲
js實(shí)現(xiàn)貪吃蛇小游戲（加墻）
JVM之class文件結(jié)構(gòu)
Python進(jìn)行統(tǒng)計(jì)建模
Python3 操作 MySQL 插入一條數(shù)據(jù)并返回主鍵 id的實(shí)例
Python如何根據(jù)時(shí)間序列數(shù)據(jù)作圖
關(guān)于Spring AOP使用時(shí)的一些問題匯總
JSP實(shí)現(xiàn)百萬富翁猜數(shù)字游戲
三個(gè)不常見的 HTML5 實(shí)用新特性簡介
python中讀入二維csv格式的表格方法詳解(以元組/列表形式表示)
Java異步調(diào)用轉(zhuǎn)同步方法實(shí)例詳解