-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path211203-Python读取Excel中的图片(二).html
33 lines (33 loc) · 21.1 KB
/
211203-Python读取Excel中的图片(二).html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<!DOCTYPE html><html lang="zh-CN"><head><meta name="generator" content="Hexo 3.9.0"><meta http-equiv="content-type" content="text/html; charset=utf-8"><meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" name="viewport"><meta content="yes" name="apple-mobile-web-app-capable"><meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"><meta content="telephone=no" name="format-detection"><meta name="description" content><title>Python读取Excel中的图片(二) | zhiheng's blog</title><link rel="stylesheet" type="text/css" href="/css/style.css?v=1.0.0"><link rel="stylesheet" type="text/css" href="//cdn.jsdelivr.net/npm/normalize.css/normalize.min.css"><link rel="stylesheet" type="text/css" href="//cdn.jsdelivr.net/npm/purecss/build/pure-min.min.css"><link rel="stylesheet" type="text/css" href="//cdn.jsdelivr.net/npm/purecss/build/grids-responsive-min.css"><link rel="stylesheet" href="//cdn.jsdelivr.net/npm/[email protected]/css/font-awesome.min.css"><script type="text/javascript" src="//cdn.jsdelivr.net/npm/jquery/dist/jquery.min.js"></script><link rel="icon" mask sizes="any" href="/img/favicon.ico"><link rel="Shortcut Icon" type="image/x-icon" href="/img/favicon.ico"><link rel="apple-touch-icon" href="/apple-touch-icon.png"><link rel="apple-touch-icon-precomposed" href="/apple-touch-icon.png"><link rel="alternate" type="application/atom+xml" href="/atom.xml"><script type="text/javascript" src="//cdn.jsdelivr.net/npm/clipboard/dist/clipboard.min.js"></script><script type="text/javascript" src="//cdn.jsdelivr.net/gh/codeseven/toastr/build/toastr.min.js"></script><link rel="stylesheet" href="//cdn.jsdelivr.net/gh/codeseven/toastr/build/toastr.min.css"></head><body><div class="body_container"><div id="header"><div class="site-name"><h1 class="hidden">Python读取Excel中的图片(二)</h1><a id="logo" href="/.">zhiheng's blog</a><p class="description">喜于分享,勤于积累;欢迎关注我的微信公众号:治恒说说</p></div><div id="nav-menu"><a class="current" href="/."><i class="fa fa-home"> 首页</i></a><a href="/archives/"><i class="fa fa-archive"> 归档</i></a><a href="/about/"><i class="fa fa-user"> 关于</i></a><a href="/demo/"><i class="fa fa-square"> 有趣的代码</i></a></div></div><div class="pure-g" id="layout"><div class="pure-u-1 pure-u-md-3-4"><div class="content_container"><div class="post"><h1 class="post-title">Python读取Excel中的图片(二)</h1><div class="post-meta">2021年12月03日<span> | </span><span class="category"><a href="/categories/Python/">Python</a></span><script src="https://busuanzi.ibruce.info/busuanzi/2.3/busuanzi.pure.mini.js" async></script><span id="busuanzi_container_page_pv"> | <span id="busuanzi_value_page_pv"></span><span> 阅读</span></span></div><div class="post-content"><p>继上一篇 <a href="https://dddreams.github.io/211116-%E4%BD%BF%E7%94%A8Python%E8%AF%BB%E5%8F%96Excel%E4%B8%AD%E7%9A%84%E5%9B%BE%E7%89%87%E5%B9%B6%E5%AF%B9%E5%BA%94%E5%88%B0%E8%AE%B0%E5%BD%95.html">使用Python读取Excel中的图片并对应到记录</a> ,经过实践之后,又发现了问题,便有了今天这篇。</p>
<h2 id="1、经过实践后发现的问题"><a href="#1、经过实践后发现的问题" class="headerlink" title="1、经过实践后发现的问题"></a>1、经过实践后发现的问题</h2><p>代码经过实践后,发现还是有问题,有些图片还是对应不到相应的记录,于是又开始了一波debugger,发现不是代码的锅,而是Excel解压后<code>drawing1.xml</code>的锅,来看看我们解析xml的代码:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">_f</span><span class="params">(subElementObj)</span>:</span></span><br><span class="line"> <span class="keyword">for</span> anchor <span class="keyword">in</span> subElementObj:</span><br><span class="line"> xdr_from = anchor.getElementsByTagName(<span class="string">'xdr:from'</span>)[<span class="number">0</span>]</span><br><span class="line"> col = xdr_from.childNodes[<span class="number">0</span>].firstChild.data <span class="comment"># 获取标签间的数据</span></span><br><span class="line"> row = xdr_from.childNodes[<span class="number">2</span>].firstChild.data</span><br><span class="line"> embed = anchor.getElementsByTagName(<span class="string">'xdr:pic'</span>)[<span class="number">0</span>].getElementsByTagName(<span class="string">'xdr:blipFill'</span>)[<span class="number">0</span>].getElementsByTagName(<span class="string">'a:blip'</span>)[<span class="number">0</span>].getAttribute(<span class="string">'r:embed'</span>) <span class="comment"># 获取属性</span></span><br><span class="line"> image_info[(int(row), int(col))] = img_dict.get(int(embed.replace(<span class="string">'rId'</span>, <span class="string">''</span>)), {}).get(img_feature)</span><br></pre></td></tr></table></figure>
<p>要解析的xml文档部分内容:</p>
<figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag"><<span class="name">xdr:pic</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">xdr:blipFill</span>></span></span><br><span class="line"> ...</span><br><span class="line"> <span class="tag"><<span class="name">a:blip</span> <span class="attr">r:embed</span>=<span class="string">"rId1"</span> <span class="attr">cstate</span>=<span class="string">"print"</span>></span></span><br><span class="line"> ...</span><br><span class="line"> <span class="tag"></<span class="name">a:blip</span>></span></span><br><span class="line"> <span class="tag"></<span class="name">xdr:blipFill</span>></span></span><br><span class="line"><span class="tag"></<span class="name">xdr:pic</span>></span></span><br></pre></td></tr></table></figure>
<p>获取到<code><a:blip></code>元素的<code>r:embed</code>属性,即对应团片的序号,实际上,如果Excel内容是从其他地方复制过来的,他的序号与图片的序号对应不上,导致的问题,遗憾的是没找到什么原因,不知道Excel中是如何对应的,有兴趣的同学可以研究下。</p>
<h2 id="2、另一种方式的实现"><a href="#2、另一种方式的实现" class="headerlink" title="2、另一种方式的实现"></a>2、另一种方式的实现</h2><p>另一种方式是使用<code>openpyxl</code>和<code>openpyxl_image_loader</code>库,按行读取,loader 图片进行保存,完整代码见:<a href="https://github.com/dddreams/read-excel-image/blob/master/new_read_data.pyhttps://github.com/dddreams/read-excel-image/blob/master/new_read_data.py" target="_blank" rel="noopener">new_read_data.py</a>。</p>
<h2 id="3、新增的需求"><a href="#3、新增的需求" class="headerlink" title="3、新增的需求"></a>3、新增的需求</h2><ul>
<li><p>循环读取在某个目录下的多个文件))</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> root, dirs, files <span class="keyword">in</span> os.walk(source_root):</span><br><span class="line"> <span class="keyword">for</span> file <span class="keyword">in</span> files:</span><br><span class="line"> print(os.path.join(root, file))</span><br></pre></td></tr></table></figure>
</li>
<li><p>leader 要求照片大于200K不入库,于是添加了压缩图片的功能,我将压缩图片的代码分离了出来<a href="https://github.com/dddreams/read-excel-image/blob/master/compress_image.py" target="_blank" rel="noopener">compress_image.py</a>。</p>
</li>
<li><p>将有问题的数据记录下来,写入 Excel,于是有了写入Excel的代码。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">wb = Workbook()</span><br><span class="line">ws = wb.create_sheet(<span class="string">"存在问题的数据"</span>, <span class="number">0</span>)</span><br><span class="line">index = <span class="number">1</span></span><br><span class="line"><span class="keyword">for</span> i <span class="keyword">in</span> range(len(error_data)):</span><br><span class="line"> index = index + <span class="number">1</span></span><br><span class="line"> arr_list = error_data[i].split(<span class="string">"|"</span>)</span><br><span class="line"> <span class="keyword">for</span> j <span class="keyword">in</span> range(len(arr_list)):</span><br><span class="line"> ws.cell(row = index, column= j+<span class="number">1</span>, value = arr_list[j])</span><br><span class="line">wb.save(target_root + <span class="string">'存在问题的数据.xlsx'</span>)</span><br></pre></td></tr></table></figure>
</li>
<li><p>照片使用电话号码命名,并生成日志,写入文件。</p>
</li>
</ul>
<h2 id="4、存在的问题"><a href="#4、存在的问题" class="headerlink" title="4、存在的问题"></a>4、存在的问题</h2><p>由于原始数据中存在照片未采集的记录,但是提取到的数据中这些记录都有对应的照片,原来<code>image_loader = SheetImageLoader(ws)</code>每次读完不会清空字典,所以就会把上一个文件中对应行的照片读取到当前文件的这一行,经过搜索查找发现是<code>openpyxl-image-loader</code>的问题,相关<code>issues</code>地址:<a href="https://github.com/ultr4nerd/openpyxl-image-loader/issues/9" target="_blank" rel="noopener">images should not be static variable of SheetImageLoader</a> 。所以在每次循环结束将<code>image_loader</code> 清空即可,添加这行代码:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">image_loader._images.clear()</span><br></pre></td></tr></table></figure>
<h2 id="5、通过VB导出图片"><a href="#5、通过VB导出图片" class="headerlink" title="5、通过VB导出图片"></a>5、通过VB导出图片</h2><p>其实提取Excel中的图片可以使用VB实现,直接在Excel的sheet上右键【查看代码】然后粘贴一下代码执行就会将图片导出来,并且能以任一列的值命名。</p>
<figure class="highlight vb"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">Sub</span> 导出图片()</span><br><span class="line"> <span class="keyword">On</span> <span class="keyword">Error</span> <span class="keyword">Resume</span> <span class="keyword">Next</span></span><br><span class="line"> MkDir ThisWorkbook.Path & <span class="string">"\图片"</span></span><br><span class="line"> <span class="keyword">For</span> <span class="keyword">Each</span> pic <span class="keyword">In</span> ActiveSheet.Shapes</span><br><span class="line"> <span class="keyword">If</span> pic.Type = <span class="number">13</span> <span class="keyword">Then</span></span><br><span class="line"> RN = pic.TopLeftCell.Offset(<span class="number">0</span>, <span class="number">-3</span>).Value</span><br><span class="line"> pic.Copy</span><br><span class="line"> <span class="keyword">With</span> ActiveSheet.ChartObjects.Add(<span class="number">0</span>, <span class="number">0</span>, pic.Width, pic.Height).Chart <span class="comment">'创建图片</span></span><br><span class="line"> .Parent.<span class="keyword">Select</span></span><br><span class="line"> .Paste</span><br><span class="line"> .Export ThisWorkbook.Path & <span class="string">"\图片\"</span> & RN & <span class="string">".jpg"</span></span><br><span class="line"> .Parent.Delete</span><br><span class="line"> <span class="keyword">End</span> <span class="keyword">With</span></span><br><span class="line"> <span class="keyword">End</span> <span class="keyword">If</span></span><br><span class="line"> <span class="keyword">Next</span></span><br><span class="line"> MsgBox <span class="string">"导出图片完成! "</span></span><br><span class="line"><span class="keyword">End</span> <span class="keyword">Sub</span></span><br></pre></td></tr></table></figure>
<h2 id="6、总结"><a href="#6、总结" class="headerlink" title="6、总结"></a>6、总结</h2><p>经过不断的折腾,发现条条大路通罗马才是真理,不管你用什么方式实现,发现问题、解决问题才是最重要的经历。</p>
<p><div style="text-align:center;margin:0;" markdown="1"><img src="../img/ddAnswer.jpg" style="margin:0 auto;"></div></p>
<p style="text-align: center;margin:0;">更多文章请关注微信公众号: 治恒说说</p>
</div><div class="tags"><a href="/tags/Python/"><i class="fa fa-tag"></i>Python</a><a href="/tags/Excel/"><i class="fa fa-tag"></i>Excel</a></div><div class="post-nav"><a class="pre" href="/211204-github精选-一款高颜值的Redis客户端.html">github精选-一款高颜值的Redis客户端</a><a class="next" href="/211124-github精选-Linux命令大全.html">github精选-Linux命令大全</a></div></div></div></div><div class="pure-u-1-4 hidden_mid_and_down"><div id="sidebar"><div class="widget"><div class="search-form"><input id="local-search-input" placeholder="Search" type="text" name="q" results="0"><div id="local-search-result"></div></div></div><div class="widget"><div class="widget-title"><i class="fa fa-wechat"> 微信公众号</i></div><div class="tagcloud"><img src="/img/zhihengss.jpg" style="width: 80%"></div></div><div class="widget"><div class="widget-title"><i class="fa fa-folder-o"> 分类</i></div><ul class="category-list"><li class="category-list-item"><a class="category-list-link" href="/categories/Github/">Github</a><span class="category-list-count">5</span></li><li class="category-list-item"><a class="category-list-link" href="/categories/Mysql/">Mysql</a><span class="category-list-count">2</span></li><li class="category-list-item"><a class="category-list-link" href="/categories/Python/">Python</a><span class="category-list-count">2</span></li><li class="category-list-item"><a class="category-list-link" href="/categories/Spring-Boot/">Spring Boot</a><span class="category-list-count">6</span></li></ul></div><div class="widget"><div class="widget-title"><i class="fa fa-star-o"> 标签</i></div><div class="tagcloud"><a href="/tags/杂谈/" style="font-size: 15px;">杂谈</a> <a href="/tags/Kotlin/" style="font-size: 15px;">Kotlin</a> <a href="/tags/Google/" style="font-size: 15px;">Google</a> <a href="/tags/微信/" style="font-size: 15px;">微信</a> <a href="/tags/Photo/" style="font-size: 15px;">Photo</a> <a href="/tags/旅行/" style="font-size: 15px;">旅行</a> <a href="/tags/go/" style="font-size: 15px;">go</a> <a href="/tags/总结/" style="font-size: 15px;">总结</a> <a href="/tags/推荐/" style="font-size: 15px;">推荐</a> <a href="/tags/Spring-Boot/" style="font-size: 15px;">Spring Boot</a> <a href="/tags/Java/" style="font-size: 15px;">Java</a> <a href="/tags/运维/" style="font-size: 15px;">运维</a> <a href="/tags/Mysql/" style="font-size: 15px;">Mysql</a> <a href="/tags/Github/" style="font-size: 15px;">Github</a> <a href="/tags/教程/" style="font-size: 15px;">教程</a> <a href="/tags/软件/" style="font-size: 15px;">软件</a> <a href="/tags/Python/" style="font-size: 15px;">Python</a> <a href="/tags/Excel/" style="font-size: 15px;">Excel</a> <a href="/tags/工具/" style="font-size: 15px;">工具</a> <a href="/tags/JavaScript/" style="font-size: 15px;">JavaScript</a> <a href="/tags/Test/" style="font-size: 15px;">Test</a> <a href="/tags/Chrome/" style="font-size: 15px;">Chrome</a> <a href="/tags/Css/" style="font-size: 15px;">Css</a> <a href="/tags/Html/" style="font-size: 15px;">Html</a> <a href="/tags/设计模式/" style="font-size: 15px;">设计模式</a> <a href="/tags/Web/" style="font-size: 15px;">Web</a> <a href="/tags/React-Native/" style="font-size: 15px;">React-Native</a> <a href="/tags/ES6/" style="font-size: 15px;">ES6</a></div></div><div class="widget"><div class="widget-title"><i class="fa fa-file-o"> 最近文章</i></div><ul class="post-list"><li class="post-list-item"><a class="post-list-link" href="/211206-github精选-github加速访问神器.html">github精选-github加速访问神器</a></li><li class="post-list-item"><a class="post-list-link" href="/211204-github精选-一款高颜值的Redis客户端.html">github精选-一款高颜值的Redis客户端</a></li><li class="post-list-item"><a class="post-list-link" href="/211203-Python读取Excel中的图片(二).html">Python读取Excel中的图片(二)</a></li><li class="post-list-item"><a class="post-list-link" href="/211124-github精选-Linux命令大全.html">github精选-Linux命令大全</a></li><li class="post-list-item"><a class="post-list-link" href="/211120-github精选-PeaZip一款好用又免费的压缩软件.html">github精选-PeaZip一款好用又免费的压缩软件</a></li><li class="post-list-item"><a class="post-list-link" href="/211116-使用Python读取Excel中的图片并对应到记录.html">使用Python读取Excel中的图片并对应到记录</a></li><li class="post-list-item"><a class="post-list-link" href="/211111-SpringRestTemplate的使用.html">Spring RestTemplate的使用</a></li><li class="post-list-item"><a class="post-list-link" href="/211108-github精选-秒杀系统设计与实现.html">github精选-秒杀系统设计与实现</a></li><li class="post-list-item"><a class="post-list-link" href="/210208-Mysql登录失败多次锁定配置.html">Mysql登录失败多次锁定配置</a></li><li class="post-list-item"><a class="post-list-link" href="/210206-SpringBoot-使用JdbcTemplate操作数据库.html">SpringBoot-使用JdbcTemplate操作数据库</a></li></ul></div><div class="widget"><div class="widget-title"><i class="fa fa-external-link"> 友情链接</i></div><ul></ul><a href="https://geektutu.com/" title="极客兔兔" target="_blank">极客兔兔</a><ul></ul><a href="https://www.itmuch.com/" title="周立的博客" target="_blank">周立的博客</a><ul></ul><a href="https://liwenzhou.com/" title="李文周的博客" target="_blank">李文周的博客</a><ul></ul><a href="https://github.com/dddreams/dddreams.github.io/issues" title="添加友链请提交Issues" target="_blank">添加友链请提交Issues</a></div></div></div><div class="pure-u-1 pure-u-md-3-4"><div id="footer">Copyright © 2021 <a href="/." rel="nofollow">zhiheng's blog.</a> Powered by<a rel="nofollow" target="_blank" href="https://hexo.io"> Hexo.</a><a rel="nofollow" target="_blank" href="https://github.com/tufu9441/maupassant-hexo"> Theme</a> by<a rel="nofollow" target="_blank" href="https://github.com/pagecho"> Cho.</a></div></div></div><a class="show" id="rocket" href="#top"></a><script type="text/javascript" src="/js/totop.js?v=1.0.0" async></script><script type="text/javascript" src="//cdn.jsdelivr.net/gh/fancyapps/fancybox/dist/jquery.fancybox.min.js" async></script><script type="text/javascript" src="/js/fancybox.js?v=1.0.0" async></script><link rel="stylesheet" type="text/css" href="//cdn.jsdelivr.net/gh/fancyapps/fancybox/dist/jquery.fancybox.min.css"><link rel="stylesheet" type="text/css" href="/css/search.css?v=1.0.0"><script type="text/javascript" src="/js/search.js?v=1.0.0"></script><script>var search_path = 'search.xml';
if (search_path.length == 0) {
search_path = 'search.xml';
}
var path = '/' + search_path;
searchFunc(path, 'local-search-input', 'local-search-result');
</script><script type="text/javascript" src="/js/copycode.js" successtext="复制成功!"></script><link rel="stylesheet" type="text/css" href="/css/copycode.css"><script type="text/javascript" src="/js/codeblock-resizer.js?v=1.0.0"></script><script type="text/javascript" src="/js/smartresize.js?v=1.0.0"></script></div></body></html>